alignment and endian issues

wyn...@gmail.com

unread,

Apr 17, 2018, 5:08:55 AM4/17/18

to

Can is_valid2(..) replace is_valid(..)?
I'm concerned about alignment and endian issues on different CPU.

// data always points to character sequence of length >=8
bool is_valid(const char* data)
{
return (!(data[0]&'\x80'))&&
(!(data[1]&'\x80'))&&
(!(data[2]&'\x80'))&&
(!(data[3]&'\x80'))&&
(!(data[4]&'\x80'))&&
(!(data[5]&'\x80'))&&
(!(data[6]&'\x80'))&&
(!(data[7]&'\x80'));
};

bool is_valid2(const char* data)
{
return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
};

David Brown

unread,

Apr 17, 2018, 5:20:29 AM4/17/18

to

Endian issues are not going to be a problem. C and C++ allow for a lot
of flexibility in the representation of integer types, but uint64_t (and
similar types) are far stricter.

But alignment /will/ be a problem on some platforms. Some cpus are
happy with a non-aligned access, others are not. Even on platforms
which are mostly happy (such as x86), some instructions (certain SIMD
operations) require strict alignment.

So unless you are sure that "data" is 8-byte aligned, you risk problems.

Also, I think, you have your logic inverted somewhere. But that should
be easily solved by an extra cup of coffee for whichever one of us has
got it wrong :-)

wyn...@gmail.com

unread,

Apr 17, 2018, 5:22:18 AM4/17/18

to

wyn...@gmail.com於 2018年4月17日星期二 UTC+8下午5時08分55秒寫道：

// Correction: is_valid2(..)
bool is_valid2(const char* data)
{
return !*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
};

And similarly, if data points to a 4-character sequence, can it be
interpreted to uint32_t in this very similar function?

Paavo Helde

unread,

Apr 17, 2018, 6:29:45 AM4/17/18

to

Alignment mismatch would be a real danger on some platforms.

What about this replacement which is also basically one-liner and does
not suffer from alignment issues:

bool is_valid(const char* data) {
return std::find_if(data, data+8,
[](char c) {return c&'\x80';})==data+8;
}

Chris Vine

unread,

Apr 17, 2018, 6:30:00 AM4/17/18

to

is_valid2() is technically undefined behaviour unless the object
pointed to by the 'data' argument began life as a uint64_t object,
otherwise dereferencing the return value of the reinterpret_cast
expression breaches the strict aliasing rules. However in practice
that doesn't matter unless is_valid2() is an inline function. If its
definition is in a different translation unit, the compiler cannot
deduce its dynamic type anyway so you are fine.

If you want the operation to be done inline then the bullet-proof and
standard conforming approach is to memcpy() the 8 bytes of 'data' into a
uint64_t object and bitwise-and that. The compiler will optimize out
the memcpy() and produce optimal code if 'data' was correctly aligned
and isn't mutated; if not it will at least end up correctly aligned. An
alternative to type pun through a union and rely on gcc's and clang's
language extension which allows this.

Chris

David Brown

unread,

Apr 17, 2018, 7:38:53 AM4/17/18

to

I don't know the details of the C++ standard well enough to know about that.

> However in practice
> that doesn't matter unless is_valid2() is an inline function. If its
> definition is in a different translation unit, the compiler cannot
> deduce its dynamic type anyway so you are fine.
>
> If you want the operation to be done inline then the bullet-proof and
> standard conforming approach is to memcpy() the 8 bytes of 'data' into a
> uint64_t object and bitwise-and that. The compiler will optimize out
> the memcpy() and produce optimal code if 'data' was correctly aligned
> and isn't mutated; if not it will at least end up correctly aligned.

Agreed.

> An
> alternative to type pun through a union and rely on gcc's and clang's
> language extension which allows this.

I don't see how that could work without having either aliasing or
alignment problems. Maybe using "packed" and "may_alias" attributes
would help. But the memcpy seems simpler.

>
> Chris
>

Chris Vine

unread,

Apr 17, 2018, 8:05:03 AM4/17/18

to

On Tue, 17 Apr 2018 13:38:41 +0200

Constructing a union would have the same effect as memcpy() in
practice. If alignment is correct then construction of the union can
be elided. Otherwise the type punning union will have to be
constructed on the stack in which case it is obliged to have the
correct alignment for all its members. It doesn't have an aliasing
problem because the gcc/clang language extension says it doesn't (see
also the sixth bullet of §3.10/10 of the C++ standard).

memcpy() is stupendously fast on modern hardward, being an
"intrinsic" (VS) or "built-in" (gcc/clang) which where relevant will do
a direct memory blit rather than have effect as a function call. And
because it is a built-in it can (and will) be trivially elided if not
necessary, as in the case of is_valid2().

Given that memcpy() is standard conforming and a union relies on an
extension I would go for the former. I would wager that when tested it
will turn out considerably faster than is_valid() and at least as fast
as is_valid2() with a reinterpret_cast. Measurement by the OP is easy
here and will reveal all.

Chris

Öö Tiib

unread,

Apr 17, 2018, 8:16:33 AM4/17/18

to

On Tuesday, 17 April 2018 12:20:29 UTC+3, David Brown wrote:
>
> Also, I think, you have your logic inverted somewhere. But that should
> be easily solved by an extra cup of coffee for whichever one of us has
> got it wrong :-)

To me it seemed that (either coffee or) the usual comparison trick to
check that all bits were set was missing:

return (*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080ULL)
== 0x8080808080808080LLU;

Alf P. Steinbach

unread,

Apr 17, 2018, 8:35:18 AM4/17/18

to

If a byte is 8 bits and the data is properly aligned so that you avoid
UB for that, the two functions still compute two different things.

Consider an input of all zeroes. `is_valid` then returns true while
`is_valid2` returns false.

Cheers & hth.,

- Alf

Barry Schwarz

unread,

Apr 17, 2018, 12:35:17 PM4/17/18

to

On Tue, 17 Apr 2018 02:22:09 -0700 (PDT), wyn...@gmail.com wrote:

>wyn...@gmail.com? 2018?4?17???? UTC+8??5?08?55????

Since ! has higher precedence than &, isn't the expression evaluated
as
!(*reinterpret_cast<const uint64_t*>(data)) & 0x8080808080808080L
and since ! evaluates to 0 or 1 won't this always return 0?

After resolving the alignment issue, you would need
!(*reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L)

You did not tell us what valid means in this context. It appears you
are looking for "normal" characters but you specifically raised the
issue of different CPUs. If so, be aware there are systems that don't
use ASCII (such as IBM mainframes that use EBCDIC) and on such systems
normal text like ABCD1234 would fail either test.

--
Remove del for email

wyn...@gmail.com

unread,

Apr 18, 2018, 12:43:21 AM4/18/18

to

Chris Vine於 2018年4月17日星期二 UTC+8下午6時30分00秒寫道：

You are right, thanks a lots. I'm astonished complier now is so smart!
These is_valid(..) functions are for checking for long sequence of
'valid' 7-bits characters. By your suggestion, I reimplemented
is_valid2(..), and tested (these functions reside in library, not inline).

bool is_valid3(const char* data)
{
uint64_t tt;
::memcpy(&tt,data,sizeof(tt));
return !(tt&0x8080808080808080);
};

Compiler: g++ v7.3.1
CPU : Intel i5-6500

g++ test.cpp
Case1 (all valid) : avg=1.2975 faster (than is_valid(..))
case2 (8 inavalid, 1 invalid): avg=0.5086 faster (slower)

g++ test.cpp -O2
Case1 (all valid) : avg=1.8934 faster
Case2 (8 invalid, 1 invalid): avg=1.0723 faster

------------------------------------------------------------
And I also tested 6-bytes version of is_valid(..)

bool is_valid6(const char* data) // should be free of endian issue
{
uint64_t tt(0x8080808080808080);
::memcpy(&tt,data,6);
return !(tt&0x8080808080808080);
};

g++ test.cpp
Case1 (all valid) : avg=0.47313 faster (slower)
case2 (8 inavalid, 1 invalid): avg=0.22621 faster (slower)

g++ test.cpp -O2
Case1 (all valid) : avg=1.8695 faster
Case2 (8 invalid, 1 invalid): avg=1.0085 faster

[Conclusion] memcpy implement should compile with -O2 to gain
advantage. I can't see union of significance.

wyn...@gmail.com

unread,

Apr 18, 2018, 1:05:03 AM4/18/18

to

Barry Schwarz於 2018年4月18日星期三 UTC+8上午12時35分17秒寫道：

Thanks reminded me of precedence issue and the IBM machines.
The 7-bits encoding characters are stored in disk and tx/rx
through sockets. I can't figure out how to deal with EBCDIC coding.

wyn...@gmail.com

unread,

Apr 18, 2018, 1:45:53 AM4/18/18

to

Paavo Helde於 2018年4月17日星期二 UTC+8下午6時29分45秒寫道：

Quite some features added since C++11 seemed unsurely necessary to me
I have been busying figuring C++ before C++11 till now.
In this case, I'm not sure if this lambda expression can beat memcpy.

In general, learning newer C++ features means one has to spend more
time finding 'bugs' than solving realistic problems (If one studies
or pratices C++ deeply enough)

Juha Nieminen

unread,

Apr 18, 2018, 3:05:39 AM4/18/18

to

wyn...@gmail.com wrote:
> bool is_valid2(const char* data)
> {
> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
> };

If the pointer is not aligned to a 64-bit integer boundary, it will cause
a hardware interrupt to be triggered in some architectures (which in
practice means the program will crash). With more permissive architectures
it will probably cause a slowdown (of some clock cycles).

The standard library does all kinds of optimizations similar to that one
all the time, but it always makes sure that pointers are correctly aligned.
(For example, many std::strcmp() implementations will compare more than one
byte at a time, but they do it in a manner that ensures alignment of the
compared values. In other words, they first compare byte-by-byte until the
first proper alignment offset, then they compare word-by-word, for a given
word size, and then they again compare byte-by-byte for any possible
remaining data.)

Paavo Helde

unread,

Apr 18, 2018, 5:13:29 AM4/18/18

to

Depends on what are your goals, you did no state them in the original post.

If you want maximum speed, memcpy is most probably faster than find_if
as it has 8 times less branching in the code.

If you want short code and less bugs then find_if is better IMO.

>
> In general, learning newer C++ features means one has to spend more
> time finding 'bugs' than solving realistic problems (If one studies
> or pratices C++ deeply enough)

C++11 features are meant for reducing bugs, not for creating them. You
can do many things in much less code, meaning there is a lesser chance
to create bugs. Also, if you misuse something you often get compile-time
errors, unlike for memcpy() and friends.

wyn...@gmail.com

unread,

Apr 18, 2018, 9:00:38 AM4/18/18

to

Paavo Helde於 2018年4月18日星期三 UTC+8下午5時13分29秒寫道：

Understood. I did not meant to deny your reply. And sorry I made some typos

> >
> > In general, learning newer C++ features means one has to spend more
> > time finding 'bugs' than solving realistic problems (If one studies
> > or pratices C++ deeply enough)
>
> C++11 features are meant for reducing bugs, not for creating them. You
> can do many things in much less code, meaning there is a lesser chance
> to create bugs. Also, if you misuse something you often get compile-time
> errors, unlike for memcpy() and friends.

These words means lots of things to me. But off-topic of this post, think twice.

Rick C. Hodgin

unread,

Apr 18, 2018, 9:20:41 AM4/18/18

to

I would simplify the source code for those who come after you so it
is more visually clear which operation is taking place and where. The
compiler will optimize away the extra declared local variable here:

bool is_valid2(const char* data)
{
unit64_t d;

d = *reinterpret_cast<const uint64_t*>(data);
return(d & 0x8080808080808080L);
};

It is interchangeable on every platform I'm aware of because all bytes
are addressed identically across the full 8-byte range. Endianness
would not be a factor.

I wonder also if you don't actually want this code to test all bits,
rather than just some bits being on:

// Test if all bits are on
bool is_valid2(const char* data)
{
uint64_t d;

d = *reinterpret_cast<const uint64_t*>(data);
return((d & 0x8080808080808080L) == 0x8080808080808080L);
};

??

--
Rick C. Hodgin

wyn...@gmail.com

unread,

Apr 18, 2018, 9:52:43 AM4/18/18

to

Juha Nieminen於 2018年4月18日星期三 UTC+8下午3時05分39秒寫道：

> wyn...@gmail.com wrote:
> > bool is_valid2(const char* data)
> > {
> > return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
> > };
>
> If the pointer is not aligned to a 64-bit integer boundary, it will cause
> a hardware interrupt to be triggered in some architectures (which in
> practice means the program will crash). With more permissive architectures
> it will probably cause a slowdown (of some clock cycles).
>

Thanks for the information. I know Intel x86 compatible CPU should be
OK and C++ standard says not OK. But C++'s words implies nearly all
kinds of CPU. In my cases, I just worry about CPU that can address say
2-Gbytes of space, and see if there is chance is_valid2(..) can work.

wyn...@gmail.com

unread,

Apr 18, 2018, 10:45:10 AM4/18/18

to

Rick C. Hodgin於 2018年4月18日星期三 UTC+8下午9時20分41秒寫道：

I don't understand if my original is_valid2(..) won't work while your
version works (on platforms you are aware of)?

> I wonder also if you don't actually want this code to test all bits,
> rather than just some bits being on:
>
> // Test if all bits are on
> bool is_valid2(const char* data)
> {
> uint64_t d;
>
> d = *reinterpret_cast<const uint64_t*>(data);
> return((d & 0x8080808080808080L) == 0x8080808080808080L);
> };
>
> ??
>
> --
> Rick C. Hodgin

No, I just need to check all character<0x80 (casted to unsigned),
something like checking for valid MIDI or ASCII data bytes.

And I felt '0x8080808080808080L' should be written as 0x8080808080808080
,no trailing L(or LL), let compiler decide.

Rick C. Hodgin

unread,

Apr 18, 2018, 10:59:36 AM4/18/18

to

On 4/18/2018 10:44 AM, wyn...@gmail.com wrote:
> Rick C. Hodgin於 2018年4月18日星期三 UTC+8下午9時20分41秒寫道：

> I don't understand if my original is_valid2(..) won't work while your
> version works (on platforms you are aware of)?

Yours will work. It is just unclear in source code because it's doing
all of those things on a single line. The operation is more clear when
you first break out the integer portion to its on variable, then perform
the & operation on that integer.

It's just for readability.

--
Rick C. Hodgin

Chris Vine

unread,

Apr 18, 2018, 11:02:23 AM4/18/18

to

Garbage. It's undefined behaviour.

Rick C. Hodgin

unread,

Apr 18, 2018, 11:28:05 AM4/18/18

to

On 4/18/2018 11:02 AM, Chris Vine wrote:
> On Wed, 18 Apr 2018 06:20:27 -0700 (PDT)
> "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
>> I would simplify the source code for those who come after you so it
>> is more visually clear which operation is taking place and where. The
>> compiler will optimize away the extra declared local variable here:
>>
>> bool is_valid2(const char* data)
>> {
>> unit64_t d;
>>
>> d = *reinterpret_cast<const uint64_t*>(data);
>> return(d & 0x8080808080808080L);
>> };
>>
>> It is interchangeable on every platform I'm aware of because all bytes
>> are addressed identically across the full 8-byte range. Endianness
>> would not be a factor.
>>

> Garbage. It's undefined behaviour.

Only if you use architectures which do not allow such things. A
simple unit test would confirm if it works or not on any architec-
ture.

Compilers often allow things the standard does not. It's not a
guarantee it will work, but it's easy enough to validate.

Also, nice to know you actually read my posts. :-) I learn a lot
from you. You're very precise and technical.

--
Rick C. Hodgin

Chris Vine

unread,

Apr 18, 2018, 12:04:05 PM4/18/18

to

On Wed, 18 Apr 2018 11:27:52 -0400

"Rick C. Hodgin" <rick.c...@gmail.com> wrote:
> On 4/18/2018 11:02 AM, Chris Vine wrote:
> > On Wed, 18 Apr 2018 06:20:27 -0700 (PDT)
> > "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
> >> I would simplify the source code for those who come after you so it
> >> is more visually clear which operation is taking place and where. The
> >> compiler will optimize away the extra declared local variable here:
> >>
> >> bool is_valid2(const char* data)
> >> {
> >> unit64_t d;
> >>
> >> d = *reinterpret_cast<const uint64_t*>(data);
> >> return(d & 0x8080808080808080L);
> >> };
> >>
> >> It is interchangeable on every platform I'm aware of because all bytes
> >> are addressed identically across the full 8-byte range. Endianness
> >> would not be a factor.
> >>
> > Garbage. It's undefined behaviour.
>
> Only if you use architectures which do not allow such things. A
> simple unit test would confirm if it works or not on any architec-
> ture.

Wrong. It's undefined behaviour irrespective of the architecture.
It's about the compiler having the license to optimize the code in
reliance on the strict aliasing rule (which is included in the C and
C++ standards for optimization purposes), so potentially producing the
wrong result in this case. (It might also be undefined behaviour
because of architecture dependent alignment issues, but that's beside
the point: behaviour doesn't become defined by being doubly undefined.)

> Compilers often allow things the standard does not. It's not a
> guarantee it will work, but it's easy enough to validate.

So, update your compiler or change your optimization level and your code
ceases to work? What a load of horse manure.

There seem to be two possible explanations of your explanation.

First, you made a mistake, which anyone can do.

Or second, you are ignorant of the strict aliasing rule, which is
simple. Namely, that you cannot access an object via a pointer or
reference unless the latter's type is pointer or reference to the
dynamic type of the object, subject to the exceptions in §3.10/10 of
the C++11/14 standard (§6.10/10 of the C++17 standard)[1].

Since the OP said that the input form pointed to by 'data' was a
"character sequence", yours is a classic example of how not to do it.

> Also, nice to know you actually read my posts. :-)

I saw the response to your response and thought your advice needed to
be contradicted.

Chris

[1] The principle and most useful exception being that you can access
any object or part of an object through a pointer to char or unsigned
char or (since C++17) std::byte, and carry out pointer arithmetic in
doing so.

Rick C. Hodgin

unread,

Apr 18, 2018, 12:21:47 PM4/18/18

to

I understand that. That's why I say it's easily tested with a unit
test case.

>> Compilers often allow things the standard does not. It's not a
>> guarantee it will work, but it's easy enough to validate.
>
> So, update your compiler or change your optimization level and your code
> ceases to work? What a load of horse manure.

No. Use the code. Compile it. If it fails the test case, update
a #define setting and recompile to use the alternative mechanism.

The code the user is using above will be a couple instructions on a
64-bit machine. To iterate through a loop as you suggest is slower
and more complex.

If the simpler version can be used in your compiler, why not use it?

> There seem to be two possible explanations of your explanation.
>
> First, you made a mistake, which anyone can do.
>
> Or second, you are ignorant of the strict aliasing rule, which is
> simple. Namely, that you cannot access an object via a pointer or
> reference unless the latter's type is pointer or reference to the
> dynamic type of the object, subject to the exceptions in §3.10/10 of
> the C++11/14 standard (§6.10/10 of the C++17 standard)[1].
>
> Since the OP said that the input form pointed to by 'data' was a
> "character sequence", yours is a classic example of how not to do it.

It was option 3).

The algorithm he proposed tested the data in 8-byte chunks at a time.
If it's on a 64-bit machine it's a trivial test. It's even reasonably
trivial on a 32-bit machine.

It comes down to looking at what's happening at the machine level, as
opposed to looking at what the language allows.

Even in cases of undefined behavior, a startup test to see if something
will work will prevent it from computing improperly in the case where a
compiler optimized away some UB to produce results other than desired.

I look at the fundamental operation and acknowledge that, per the OP's
statement that data will always point to 8 or more bytes, that it is
the same operation on that data block.

One thing CAlive seeks to do is remove such limitations. CAlive allows
data to be processed as data, and not through type constraints. Each
time a type is brought to bear upon a piece of data, it assumes it is
known to be correct as per the developer's knowledge. If it is in some
way incorrect, then, and only then, will it produce UB, and it will be
solely the direct result of the data, and not the code being changed.

I think it's the better philosophy regarding data. It is after all...
data.

>> Also, nice to know you actually read my posts. :-)
>
> I saw the response to your response and thought your advice needed to
> be contradicted.

Thank you for leaping in to rescue the OP from my advice. I still
stand by my suggestion. The iterative loop method you propose is
slower and very likely completely unnecessary given the nature of
data computing in assembly / machine code.

> Chris
>
> [1] The principle and most useful exception being that you can access
> any object or part of an object through a pointer to char or unsigned
> char or (since C++17) std::byte, and carry out pointer arithmetic in
> doing so.

See. Very precise and technical. I like that.

--
Rick C. Hodgin

james...@verizon.net

unread,

Apr 18, 2018, 12:56:42 PM4/18/18

to

On Wednesday, April 18, 2018 at 11:28:05 AM UTC-4, Rick C. Hodgin wrote:
> On 4/18/2018 11:02 AM, Chris Vine wrote:
> > On Wed, 18 Apr 2018 06:20:27 -0700 (PDT)
> > "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
> >> I would simplify the source code for those who come after you so it
> >> is more visually clear which operation is taking place and where. The
> >> compiler will optimize away the extra declared local variable here:
> >>
> >> bool is_valid2(const char* data)
> >> {
> >> unit64_t d;
> >>
> >> d = *reinterpret_cast<const uint64_t*>(data);
> >> return(d & 0x8080808080808080L);
> >> };
> >>
> >> It is interchangeable on every platform I'm aware of because all bytes
> >> are addressed identically across the full 8-byte range. Endianness
> >> would not be a factor.
> >>
> > Garbage. It's undefined behaviour.
>
> Only if you use architectures which do not allow such things.

Whether or not code has undefined behavior is specified by the C++
standard, not the architecture. It can indirectly depend upon
implementation-specific issue, such as the range of representable values
of a given type, but only insofar as what the C++ says about the code
covers those issues.

> ... A

> simple unit test would confirm if it works or not on any architec-
> ture.

"Undefined behavior" cannot be disproved by a unit test. Undefined
behavior means that there's no restrictions on what the program can do.
Therefore, no matter what result you expect from your unit test, that
result is compatible with "undefined behavior". In particular,
"undefined behavior" permits, as one possibility, that your code will do
precisely whatever it was you incorrectly thought it was required to do.

> Compilers often allow things the standard does not.

The standard imposes no requirements on the behavior when it is
undefined, so it's impossible for there to be any behavior allowed by
the compiler that is not also allowed by the standard.

Rick C. Hodgin

unread,

Apr 18, 2018, 1:22:15 PM4/18/18

to

On 4/18/2018 12:56 PM, james...@verizon.net wrote:
> On Wednesday, April 18, 2018 at 11:28:05 AM UTC-4, Rick C. Hodgin wrote:
>> On 4/18/2018 11:02 AM, Chris Vine wrote:
>>> On Wed, 18 Apr 2018 06:20:27 -0700 (PDT)
>>> "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
>>>> I would simplify the source code for those who come after you so it
>>>> is more visually clear which operation is taking place and where. The
>>>> compiler will optimize away the extra declared local variable here:
>>>>
>>>> bool is_valid2(const char* data)
>>>> {
>>>> unit64_t d;
>>>>
>>>> d = *reinterpret_cast<const uint64_t*>(data);
>>>> return(d & 0x8080808080808080L);
>>>> };
>>>>
>>>> It is interchangeable on every platform I'm aware of because all bytes
>>>> are addressed identically across the full 8-byte range. Endianness
>>>> would not be a factor.
>>>>
>>> Garbage. It's undefined behaviour.
>>
>> Only if you use architectures which do not allow such things.
>
> Whether or not code has undefined behavior is specified by the C++
> standard, not the architecture. It can indirectly depend upon
> implementation-specific issue, such as the range of representable values
> of a given type, but only insofar as what the C++ says about the code
> covers those issues.

I understand.

>> ... A
>> simple unit test would confirm if it works or not on any architec-
>> ture.
>
> "Undefined behavior" cannot be disproved by a unit test. Undefined
> behavior means that there's no restrictions on what the program can do.
> Therefore, no matter what result you expect from your unit test, that
> result is compatible with "undefined behavior". In particular,
> "undefined behavior" permits, as one possibility, that your code will do
> precisely whatever it was you incorrectly thought it was required to do.

I disagree. You can send it test values and see if it produces what
you expect from a positive and a negative, for example. If it gives
you the correct results, it is working correctly.

>> Compilers often allow things the standard does not.
>
> The standard imposes no requirements on the behavior when it is
> undefined, so it's impossible for there to be any behavior allowed by
> the compiler that is not also allowed by the standard.

The compiler will generate code which works based on the operation,
even if the standard says it's undefined behavior. It will generate
the appropriate code sequence which would be expected from what the
source code asks for, even if it is not defined behavior by the
compiler.

Here is what MSVC++ produces for that bit of code in a 64-bit compile:

is_valid2 PROC ; is_valid2, COMDAT
; 36 : {
; 37 : return *reinterpret_cast<const uint64_t*>(data) &
0x8080808080808080L;
mov rax, QWORD PTR data$[rbp]
mov rcx, 8080808080808080H
mov rax, QWORD PTR [rax]
and rax, rcx
test rax, rax
je SHORT $LN3@is_valid2
mov BYTE PTR tv66[rbp], 1
jmp SHORT $LN4@is_valid2
$LN3@is_valid2:
mov BYTE PTR tv66[rbp], 0
$LN4@is_valid2:
movzx eax, BYTE PTR tv66[rbp]
; 38 : };
ret 0
is_valid2 ENDP ; is_valid2

In short, it generates the expected operation. MSVC++ allows that
particular operation to go through correctly.

In optimized code, it produces a far simplified version that's brought
inline so no function call is made:

mov rdi, 8080808080808080H
test QWORD PTR data, rdi

But still the same operation, and exactly the one is_valid2() was
designed to produce ... exactly.

It's UB in the C++ standard, but it works in the compiler. And a
unit test case at startup passing in a data with all 0x80 values,
and one without, will tell if it works.

--
Rick C. Hodgin

Chris Vine

unread,

Apr 18, 2018, 1:56:32 PM4/18/18

to

On Wed, 18 Apr 2018 12:21:35 -0400

If you think so then you do not understand it.

> >> Compilers often allow things the standard does not. It's not a
> >> guarantee it will work, but it's easy enough to validate.
> >
> > So, update your compiler or change your optimization level and your code
> > ceases to work? What a load of horse manure.
>
> No. Use the code. Compile it. If it fails the test case, update
> a #define setting and recompile to use the alternative mechanism.

See above.

> The code the user is using above will be a couple instructions on a
> 64-bit machine. To iterate through a loop as you suggest is slower
> and more complex.
>
> If the simpler version can be used in your compiler, why not use it?
>
> > There seem to be two possible explanations of your explanation.
> >
> > First, you made a mistake, which anyone can do.
> >
> > Or second, you are ignorant of the strict aliasing rule, which is
> > simple. Namely, that you cannot access an object via a pointer or
> > reference unless the latter's type is pointer or reference to the
> > dynamic type of the object, subject to the exceptions in §3.10/10 of
> > the C++11/14 standard (§6.10/10 of the C++17 standard)[1].
> >
> > Since the OP said that the input form pointed to by 'data' was a
> > "character sequence", yours is a classic example of how not to do it.
>
> It was option 3).
>
> The algorithm he proposed tested the data in 8-byte chunks at a time.
> If it's on a 64-bit machine it's a trivial test. It's even reasonably
> trivial on a 32-bit machine.
>
> It comes down to looking at what's happening at the machine level, as
> opposed to looking at what the language allows.

[more of the same snipped]

There is no option 3. See above

> One thing CAlive seeks to do is remove such limitations. CAlive allows
> data to be processed as data, and not through type constraints. Each
> time a type is brought to bear upon a piece of data, it assumes it is
> known to be correct as per the developer's knowledge. If it is in some
> way incorrect, then, and only then, will it produce UB, and it will be
> solely the direct result of the data, and not the code being changed.
>
> I think it's the better philosophy regarding data. It is after all...
> data.

It is not about philosophy, whether about data or anything else. That
your code breaches the C++ language specification is the end of it.
That there is no strict aliasing rule in CAlive or some other language
(vapourware or otherwise) is irrelevant to the issue, as is whether you
think there should be a strict aliasing rule in C and C++. This is a
C++ newsgroup, there is a strict aliasing rule in C++ and if he is
writing C++ code with a C++ compiler he needs to know it.

[snip]

> The iterative loop method you propose is slower and very likely
> completely unnecessary given the nature of data computing in
> assembly / machine code.

I wasn't proposing an iterative loop. You seem to be clueless. What
makes your attitude even more ridiculous is that there is a
zero-overhead way of doing it right.

Rick C. Hodgin

unread,

Apr 18, 2018, 2:17:19 PM4/18/18

to

I agree it is the end of it with regards to the C++ language standard.
My position is it's not the end of it with regards to the compiler. The
compiler is free to do what it wants in the cases of UB, including do
the correct operation.

> That there is no strict aliasing rule in CAlive or some other language
> (vapourware or otherwise) is irrelevant to the issue, as is whether you
> think there should be a strict aliasing rule in C and C++. This is a
> C++ newsgroup, there is a strict aliasing rule in C++ and if he is
> writing C++ code with a C++ compiler he needs to know it.

Agreed. If the C++ compiler adheres explicitly to the standard it
may produce unusable code. If, however, it goes ahead and performs
the operation, as we just saw MSVC++ does, then it is working in that
compiler.

> [snip]
>> The iterative loop method you propose is slower and very likely
>> completely unnecessary given the nature of data computing in
>> assembly / machine code.
>
> I wasn't proposing an iterative loop.

I apologize. I mistook this post from Paavo Helde for being from you:

bool is_valid(const char* data) {

return std::find_if(data, data+8,
[](char c) {return c&'\x80';})==data+8;

My mistake.

> You seem to be clueless. What
> makes your attitude even more ridiculous is that there is a
> zero-overhead way of doing it right.

It's not zero-overhead. Your proposed memcpy() is iterative, and
operates on the data on a byte-by-byte basis. It is slower than
the proposal by the OP, and while yours may be conforming ... who
cares if his faster method works? If his goal is to be expressly
conforming, then it matters. But if he's targeting a range of
tools where it will work using is_valid2() ... then honestly, who
cares? Every C++ compiler is different and these things can be
wrangled into tests and validated at startup with the simple load
of a test case library that calls some functions included in the
main executable.

--
Rick C. Hodgin

Chris Vine

unread,

Apr 18, 2018, 2:40:36 PM4/18/18

to

On Wed, 18 Apr 2018 14:17:06 -0400
"Rick C. Hodgin" <rick.c...@gmail.com> wrote:
> On 4/18/2018 1:56 PM, Chris Vine wrote:

[snip]

> > You seem to be clueless. What
> > makes your attitude even more ridiculous is that there is a
> > zero-overhead way of doing it right.
>
> It's not zero-overhead. Your proposed memcpy() is iterative, and
> operates on the data on a byte-by-byte basis. It is slower than
> the proposal by the OP, and while yours may be conforming ... who
> cares if his faster method works? If his goal is to be expressly
> conforming, then it matters. But if he's targeting a range of
> tools where it will work using is_valid2() ... then honestly, who
> cares? Every C++ compiler is different and these things can be
> wrangled into tests and validated at startup with the simple load
> of a test case library that calls some functions included in the
> main executable.

It is not iterative in the sense of a loop. It is an intrinsic/built-in
which will do a block byte transfer at worst, and if your
reinterpret_cast does not fail on alignment grounds will be elided
entirely.

So you propose to recommend undefined behaviour on the ground that the
one compiler you have tested your reinterpret_cast with (VS) gives the
correct results, but not to adopt defined behaviour even though (i)
every compiler you test it with will elide the memcpy away if your
reinterpret_cast can possibly work, to produce optimal code, and (ii)
every update to VS or change to the program compile parameters may
break your version.

The correct version will be just as fast as your incorrect version, if
compiled with -O or higher and your version actually works.

Your view is deranged.

Rick C. Hodgin

unread,

Apr 18, 2018, 2:56:21 PM4/18/18

to

On 4/18/2018 2:40 PM, Chris Vine wrote:
> It is not iterative in the sense of a loop. It is an intrinsic/built-in
> which will do a block byte transfer at worst, and if your
> reinterpret_cast does not fail on alignment grounds will be elided
> entirely.

What is your code for this explicit example? How would you write it
using memcpy() to have it be as fast as the OP's is_valid2()?

--
Rick C. Hodgin

Chris Vine

unread,

Apr 18, 2018, 3:21:02 PM4/18/18

to

On Wed, 18 Apr 2018 14:56:09 -0400

"Rick C. Hodgin" <rick.c...@gmail.com> wrote:

You replace your assignment-with-reinterpret_cast to 'd' with a
memcpy() to 'd', and compile with optimization enabled. That's it.

This might help you (the memcpy() version which compiles to the same
code as the cast version is at the end):

https://blog.regehr.org/archives/959

Also, since you appear to like authority figures, the author may
persuade you where I have failed.

james...@verizon.net

unread,

Apr 18, 2018, 3:21:16 PM4/18/18

to

I'm not talking about determining whether it's working correctly. I'm
talking about determining whether or not the behavior is undefined.
Since "undefined behavior" imposes no restrictions on the behavior of
your code, the fact that your code might happen to work correctly does
not count as disproving that your code has undefined behavior.

Why should you care whether the behavior is undefined as long as the
code works?

If the behavior is defined by the standard, you can reasonably expect a
compiler that claims to be conforming to generate an executable that
provides that behavior, and will continue to provide that behavior. If
the behavior is undefined, the fact that the executable passed one unit
test provides no justification for assuming that it will pass the same
unit test the next time that it is run with the same inputs (not even if
it's exactly the same executable, with no recompilation). You have even
less justification for assuming it will continue passing that test if
compiled a second time, particularly if it's compiled by a different
standard-conforming compiler.

> >> Compilers often allow things the standard does not.
> >
> > The standard imposes no requirements on the behavior when it is
> > undefined, so it's impossible for there to be any behavior allowed by
> > the compiler that is not also allowed by the standard.
>
> The compiler will generate code which works based on the operation,
> even if the standard says it's undefined behavior.

Which is perfectly allowable, according to the standard.

> ... It will generate

> the appropriate code sequence which would be expected from what the
> source code asks for,

Well, that depends upon what you expect of the source code. If you have
any particular expectations for the behavior when the behavior is
undefined, those expectations don't come from a proper reading of the C
standard. A proper understanding of the C standard will give you no
justification for being surprised by any particular behavior exhibited
by the code - so in one sense, that means the code's behavior has met
expectations, regardless of what that behavior is.

> In short, it generates the expected operation.

And a fully conforming implementation of C is also permitted to generate
an executable that violates your unjustified expectations about the
behavior of this code.

> It's UB in the C++ standard, but it works in the compiler. And a

That's bad wording: the word "but" implies a conflict between the two
statements. UB is fully compatible with the code working exactly as you
incorrectly thought it was required to work.

Rick C. Hodgin

unread,

Apr 18, 2018, 3:33:54 PM4/18/18

to

On 4/18/2018 3:20 PM, Chris Vine wrote:
> "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
>> What is your code for this explicit example? How would you write it
>> using memcpy() to have it be as fast as the OP's is_valid2()?
>
> You replace your assignment-with-reinterpret_cast to 'd' with a
> memcpy() to 'd', and compile with optimization enabled. That's it.

I do not understand how this code would appear or work. Can you
please provide an example in source code for that compiles?

As I read it I see this:

bool is_valid2(const char* data)
{
char d[8];

memcpy(d, data, 8);

// How do I test each byte's 0x80 bit?
// return(d & 0x8080808080808080L);

Rick C. Hodgin

unread,

Apr 18, 2018, 3:45:07 PM4/18/18

to

On 4/18/2018 3:21 PM, james...@verizon.net wrote:
> Why should you care whether the behavior is undefined as long as the
> code works?
>
> If the behavior is defined by the standard, you can reasonably expect a
> compiler that claims to be conforming to generate an executable that
> provides that behavior, and will continue to provide that behavior. If
> the behavior is undefined, the fact that the executable passed one unit
> test provides no justification for assuming that it will pass the same
> unit test the next time that it is run with the same inputs (not even if
> it's exactly the same executable, with no recompilation). You have even
> less justification for assuming it will continue passing that test if
> compiled a second time, particularly if it's compiled by a different
> standard-conforming compiler.

There's no argument to that. UB can produce code doing anything it
wants so the discussion ends there.

My point is look at what your compiler generates and if it generates
proper code ... use it.

--
Rick C. Hodgin

james...@verizon.net

unread,

Apr 18, 2018, 3:49:48 PM4/18/18

to

On Wednesday, April 18, 2018 at 3:33:54 PM UTC-4, Rick C. Hodgin wrote:
> On 4/18/2018 3:20 PM, Chris Vine wrote:
> > "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
> >> What is your code for this explicit example? How would you write it
> >> using memcpy() to have it be as fast as the OP's is_valid2()?
> >
> > You replace your assignment-with-reinterpret_cast to 'd' with a
> > memcpy() to 'd', and compile with optimization enabled. That's it.
>
> I do not understand how this code would appear or work. Can you
> please provide an example in source code for that compiles?
>
> As I read it I see this:
>
> bool is_valid2(const char* data)
> {
> char d[8];

Why did you change the type of 'd'? He said nothing about doing that in his suggestion. It's supposed to be

uint64_t d;

>
> memcpy(d, data, 8);

memcpy(&d, data, sizeof d);

> // How do I test each byte's 0x80 bit?
> // return(d & 0x8080808080808080L);

That's not permitted with your definition of d, since it's an lvalue of array type that gets implicitly converted into a pointer to the first element of that array. As such, it violates 6.5.10p2. With d corrected to an integer type, there's still the issue that the OP apparently had his logic reversed for is_valid2(). I believe that it should be:

return !(d & 0x8080808080808080L);

David Brown

unread,

Apr 18, 2018, 3:52:07 PM4/18/18

to

On 18/04/18 17:27, Rick C. Hodgin wrote:
> On 4/18/2018 11:02 AM, Chris Vine wrote:
>> On Wed, 18 Apr 2018 06:20:27 -0700 (PDT)
>> "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
>>> I would simplify the source code for those who come after you so it
>>> is more visually clear which operation is taking place and where. The
>>> compiler will optimize away the extra declared local variable here:
>>>
>>>      bool is_valid2(const char* data)
>>>      {
>>>          unit64_t d;
>>>          d = *reinterpret_cast<const uint64_t*>(data);
>>>          return(d & 0x8080808080808080L);
>>>      };
>>>
>>> It is interchangeable on every platform I'm aware of because all bytes
>>> are addressed identically across the full 8-byte range. Endianness
>>> would not be a factor.
>>>
>> Garbage. It's undefined behaviour.
>
> Only if you use architectures which do not allow such things. A
> simple unit test would confirm if it works or not on any architec-
> ture.

No, it will not.

There are two potential problems here. One is the alignment issue. On
some processors (such as the x86), you don't have to have correct
alignment for loading a 64-bit value - on other processors, you /do/
need it. Even on the x86, there are instructions that can't work
unaligned - if the compiler decides to use an SIMD load here (maybe the
function is inlined, and in part of a loop) then the unaligned load will
fail for some SIMD instructions.

The other problem is that reading "const char" data through a pointer to
a "const uint64_t" is undefined behaviour as they are incompatible pointers.

It /might/ work as you expect. It might work with some compilers, and
not with others. It might work with some flags, and not with others.
It might work when the code is a separate function, but not when it is
inlined. It might work on Tuesdays but not Wednesdays. (Okay, that is
unlikely - but you get the point.)

Unit testing is great for showing that valid code with well-defined
behaviour has the behaviour you want. It is useless for determining if
the behaviour is well-defined in the first place.

>
> Compilers often allow things the standard does not.

True.

> It's not a
> guarantee it will work, but it's easy enough to validate.

Not true.

It is easy to validate if you have a compiler that documents the
behaviour (possibly with a flag - such as -fno-strict-aliasing, or with
an extension such as the "may_alias" type attribute). If the compiler
manual does not document this as an extension, it is dangerous to rely
on it.

Chris Vine

unread,

Apr 18, 2018, 3:58:30 PM4/18/18

to

On Wed, 18 Apr 2018 15:33:42 -0400

Are you serious? I even gave you a link to a similar case.

bool is_valid2(const char* data) {
std::uint64_t d;
std::memcpy(&d, data, 8);
return(d & 0x8080808080808080L);
}

Rick C. Hodgin

unread,

Apr 18, 2018, 3:58:43 PM4/18/18

to

On 4/18/2018 3:49 PM, james...@verizon.net wrote:
> On Wednesday, April 18, 2018 at 3:33:54 PM UTC-4, Rick C. Hodgin wrote:
>> On 4/18/2018 3:20 PM, Chris Vine wrote:
>>> "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
>>>> What is your code for this explicit example? How would you write it
>>>> using memcpy() to have it be as fast as the OP's is_valid2()?
>>>
>>> You replace your assignment-with-reinterpret_cast to 'd' with a
>>> memcpy() to 'd', and compile with optimization enabled. That's it.
>>
>> I do not understand how this code would appear or work. Can you
>> please provide an example in source code for that compiles?
>>
>> As I read it I see this:
>>
>> bool is_valid2(const char* data)
>> {
>> char d[8];
>
> Why did you change the type of 'd'? He said nothing about doing that in his suggestion. It's supposed to be

Because I didn't understand what he was saying. I couldn't visualize
it. And also because he wouldn't answer me with an explicit source
code example to clarify.

> uint64_t d;
> memcpy(&d, data, sizeof d);
> return !(d & 0x8080808080808080L);

The single memcpy over d makes it conforming, eh? And the explicit
non-mempcy() copy through the cast pointer isn't conforming, eh?

CAlive here I come. :-) I will never be limited such things again.

--
Rick C. Hodgin

james...@verizon.net

unread,

Apr 18, 2018, 3:59:12 PM4/18/18

to

I barely have time to design, write, and test my code. Checking whether the generated code is "proper" would add weeks of delay to every delivery I made. I wonder, just how small are the programs that you write that you can afford the time needed to review the generated code>?

Rick C. Hodgin

unread,

Apr 18, 2018, 4:03:03 PM4/18/18

to

I couldn't visualize it, Chris. And in all honesty, after seeing
what James posted, and what you post here ... I am baffled at how
that explicit memcpy() into d would work, and a cast pointer de-
reference copy, which will move the same 8 bytes, would not work.

It seems a severe limitation to the compiler and/or C++ language.

You say multiple times that I am clueless and I'm deranged and
that I have failed ... I would argue that these limitations in how
the language works are significant, and inappropriate, and are to
be replaced.

Data is data. It should be viewed as such. I believe that is the
correct philosophy to look at things like this, and any standard
which does not correlate something like a cast pointer copying 8
bytes, compared to a memcpy() which copies 8 bytes, is the insane
component of that discussion.

Regardless of my position on this, I respect your knowledge and
expertise in C++, and I thank you for your multiple replies.

--
Rick C. Hodgin

Rick C. Hodgin

unread,

Apr 18, 2018, 4:14:02 PM4/18/18

to

Pretty small. 10s of thousands of lines per app max.

It's not that I review the generated code, it's more when I encounter
something and I'm not sure how it will work in the compiler, I test it.
I typically will test things in MSVC++ and GCC (MinGW).

In addition, I am fluent in x86 assembly, and I know how things can
(and should) work at the assembly level to manipulate data without
the constraints of a language protocol... so I draw on that knowledge
as well.

--
Rick C. Hodgin

David Brown

unread,

Apr 18, 2018, 4:22:29 PM4/18/18

to

On 18/04/18 20:17, Rick C. Hodgin wrote:
> On 4/18/2018 1:56 PM, Chris Vine wrote:
>> On Wed, 18 Apr 2018 12:21:35 -0400
>> "Rick C. Hodgin" <rick.c...@gmail.com> wrote:
>>> On 4/18/2018 12:03 PM, Chris Vine wrote:

<snip>

>> It is not about philosophy, whether about data or anything else. That
>> your code breaches the C++ language specification is the end of it.
>
> I agree it is the end of it with regards to the C++ language standard.
> My position is it's not the end of it with regards to the compiler. The
> compiler is free to do what it wants in the cases of UB, including do
> the correct operation.

There is no "correct operation" as far as C++ is concerned. The code
has undefined behaviour - that means it does not make sense in C++ or to
the compiler, even if you feel the intention of the code is clear to a
human reader. (Which it is, in this particular case.)

Imagine it as though someone had written the sentence "I went for a
drive in my bar". This is grammatically correct, and has correct
spelling - a computer spell-checker cannot spot the problem. Most
people would realise "bar" was a typo for "car" - your "unit test" on
the sentence would pass. But some people - perhaps someone with a
different native language, would get confused. The sentence has
"undefined behaviour".

>
>> That there is no strict aliasing rule in CAlive or some other language
>> (vapourware or otherwise) is irrelevant to the issue, as is whether you
>> think there should be a strict aliasing rule in C and C++. This is a
>> C++ newsgroup, there is a strict aliasing rule in C++ and if he is
>> writing C++ code with a C++ compiler he needs to know it.
>
> Agreed. If the C++ compiler adheres explicitly to the standard it
> may produce unusable code. If, however, it goes ahead and performs
> the operation, as we just saw MSVC++ does, then it is working in that
> compiler.

Code like this certainly /can/ have defined behaviour for particular
compilers and/or flags. But you can only rely on it if it is
documented, and if you are sure the code will only be used on such a
compiler. Otherwise it is a very subtle error waiting to creep up on
people.

(It's fine to write code that is specific for a particular compiler -
but you should do so only if you have good reason for it. And you
should document it, and ideally cause compile-time failures if the
assumptions about the tools are broken.)

>
>> [snip]
>>> The iterative loop method you propose is slower and very likely
>>> completely unnecessary given the nature of data computing in
>>> assembly / machine code.
>>
>> I wasn't proposing an iterative loop.
> I apologize. I mistook this post from Paavo Helde for being from you:
>
>     bool is_valid(const char* data) {
>         return std::find_if(data, data+8,
>          [](char c) {return c&'\x80';})==data+8;
>
> My mistake.
>
> > You seem to be clueless. What
> > makes your attitude even more ridiculous is that there is a
> > zero-overhead way of doing it right.
>
> It's not zero-overhead. Your proposed memcpy() is iterative, and
> operates on the data on a byte-by-byte basis.

Logically, yes, memcpy() is byte for byte. In practice, good compilers
will optimise memcpy() very nicely when the operands are appropriate.
In a case like this, a good compiler (with optimisation enabled,
obviously) will do a single 64-bit load on an architecture that supports
unaligned loads. It will do its best in other cases - using
byte-for-byte loads if needed, or bigger loads if the compiler has some
information about the alignment.

So the memcpy() solution will be as fast as your version on any target
that allows unaligned loads, and /correct/ on all targets regardless of
optimisations, flags, compiler variations, etc.

> It is slower than
> the proposal by the OP, and while yours may be conforming ... who
> cares if his faster method works? If his goal is to be expressly
> conforming, then it matters. But if he's targeting a range of
> tools where it will work using is_valid2() ... then honestly, who
> cares? Every C++ compiler is different and these things can be
> wrangled into tests and validated at startup with the simple load
> of a test case library that calls some functions included in the
> main executable.
>

There are certainly cases where implementation-specific code is fine.
If the code is full of calls to WinAPI functions, then relying on x86
features is perfectly reasonable. If it is full of MSVC extensions,
then relying on MSVC behaviour is also fine. (I don't know if MSVC
documents that it allows such pointer casts in this way.)

It is also fine to do:

#if __COMPILER_XXX

// Fast implementation known to work on XXX
bool is_valid(...

#else

// Possibly slow, but definitely correct fall-back version
bool is_valid(...

#endif

But in this particular case, memcpy() is your friend. In general,
memcpy with a small fixed size should be well optimised.

David Brown

unread,

Apr 18, 2018, 4:25:51 PM4/18/18

to

You can always read the memory with memcpy(). That lets you store it
where you want.

>
> CAlive here I come. :-) I will never be limited such things again.
>

"Strict aliasing" is not really a limitation. It turns up surprisingly
rarely, considering how much some people complain about it. On the
other hand, it gives surprisingly few opportunities for optimisation.
You just have to know about it, and take it into account when writing C
or C++ code.

Chris Vine

unread,

Apr 18, 2018, 4:26:07 PM4/18/18

to

On Wed, 18 Apr 2018 16:02:50 -0400

It's because both the C and C++ standards say so. More particularly,
the dynamic type (in C "effective type") of the destination of memcpy
is the declared type of that destination and not the type of the source.
So strict aliasing rules are complied with.

It really is not that difficult. Imagine that your declaration of 'd'
and assignment with reinterpret_cast were not optimized out. The
assignment and memcpy do exactly the same thing (think about what an
assignment is) and the compiler would probably emit the same machine
code. Of course it is optimized out, as is the memcpy().

Please read the link I sent you. I think it may unbaffle you.

Rick C. Hodgin

unread,

Apr 18, 2018, 4:27:42 PM4/18/18

to

On 4/18/2018 4:22 PM, David Brown wrote:
> On 18/04/18 20:17, Rick C. Hodgin wrote:
>> It's not zero-overhead. Your proposed memcpy() is iterative, and
>> operates on the data on a byte-by-byte basis.
>
> Logically, yes, memcpy() is byte for byte. In practice, good compilers
> will optimise memcpy() very nicely when the operands are appropriate. In
> a case like this, a good compiler (with optimisation enabled, obviously)
> will do a single 64-bit load on an architecture that supports unaligned
> loads. It will do its best in other cases - using byte-for-byte loads
> if needed, or bigger loads if the compiler has some information about
> the alignment.
>
> So the memcpy() solution will be as fast as your version on any target
> that allows unaligned loads, and /correct/ on all targets regardless of
> optimisations, flags, compiler variations, etc.

I didn't realize he was simply talking about copying the 8 bytes
from data to &d, and then using them that way.

And, as you'll see I indicated in a later post, I don't understand
how a compiler can look at a memcpy() of 8 bytes from the same source
as a cast pointer dereference copy of the same 8 bytes can be UB in
one case, and not UB in another.

It shows how close behavior is to typing in C++. It is far more re-
laxed in CAlive. Anything you throw at the compiler, so long as it
is a syntatically valid expression that can operate on those types
as indicated, it will compile and run as you expect it to. It's up
to you, the developer, to know of any special cases where the actual
underlying data may be invalid.

We've had this very argument in various forms time and time and time
again back in 2014 and 2015. Neither of us will change our position.
It's a waste of time to even discuss it further.

--
Rick C. Hodgin

David Brown

unread,

Apr 18, 2018, 4:28:52 PM4/18/18

to

The trouble is, your code might give the results you want when you use
it in your unit test code - and fail when you use it in different
circumstances. Then you are left with a function that passes the tests
but fails in practice.

Rick C. Hodgin

unread,

Apr 18, 2018, 4:43:32 PM4/18/18

to

On 4/18/2018 4:25 PM, Chris Vine wrote:
> Please read the link I sent you. I think it may unbaffle you.

You characterize me repeatedly in a most negative way. I don't
know why. I am not insulting to you or anyone else, and my positions
do not stem from rampant non-thinking behavior. They stem from an
examination of data as data. This clashes with the way C++ views
data through a protocol lens, and there lies the source of the dis-
crepancy.

I simply do not see how a compiler cannot recognize that a copy from
a source to destination of an 8-byte type is 8 bytes of data being
copied, yet a memcpy() (with a parameter no less) indicating 8 bytes
tells the compiler it is an acceptable form of data exchange.

That's something someone's going to have to explain to me, because
under the hood it's the exact same data being copied. The compiler
should be able to "see" and know that. It is only through protocols
and enforcements the C++ language places on things that it is prevented.
And frankly, I see no value whatosever in those limitations being
there, hence CAlive's "data is data" model.

--
Rick C. Hodgin

Chris Vine

unread,

Apr 18, 2018, 5:00:56 PM4/18/18

to

On Wed, 18 Apr 2018 16:43:22 -0400

"Rick C. Hodgin" <rick.c...@gmail.com> wrote:

> On 4/18/2018 4:25 PM, Chris Vine wrote:
> > Please read the link I sent you. I think it may unbaffle you.
>
> You characterize me repeatedly in a most negative way. I don't
> know why. I am not insulting to you or anyone else, and my positions
> do not stem from rampant non-thinking behavior. They stem from an
> examination of data as data. This clashes with the way C++ views
> data through a protocol lens, and there lies the source of the dis-
> crepancy.

You are imagining things: there was no negative characterization in
"unbaffle". Perhaps something got lost while crossing the Atlantic.
My earlier characterization of your views on the adequacy of tests to
deal with detecting whether undefined behaviour did what was wanted as
"deranged" seems to me to be justified technically, and addressed to
your views and not to you. I chose my words carefully and refrained
from expressing a view on the latter, which I think is pretty kind given
some of your other off-topic posts to this newsgroup. I described you
as "clueless" in relation to what I took to be your intransigent
assertions of the inefficiency of memcpy(), contrary to facts. Maybe
it was unkind; but I think it was fair given that you never seem to
change your views on anything and seem confident of the correctness of
all your views and the incorrectness of everyone else's.

> I simply do not see how a compiler cannot recognize that a copy from
> a source to destination of an 8-byte type is 8 bytes of data being
> copied, yet a memcpy() (with a parameter no less) indicating 8 bytes
> tells the compiler it is an acceptable form of data exchange.

The difference is that the dynamic type of the return value of
std::reinterpret_cast is the dynamic type of the input value. So the
dereference on the assignment to 'd' gives rise to undefined
behaviour. The dynamic type of the return value of std::memcpy is the
dynamic type of the destination, so all is good.

David Brown

unread,

Apr 18, 2018, 5:28:53 PM4/18/18

to

On 18/04/18 22:02, Rick C. Hodgin wrote:

> Data is data. It should be viewed as such. I believe that is the
> correct philosophy to look at things like this, and any standard
> which does not correlate something like a cast pointer copying 8
> bytes, compared to a memcpy() which copies 8 bytes, is the insane
> component of that discussion.

That's your problem here - data is /not/ data in a typed language. Data
is organised by types, which say how that data is interpreted and what
you can do with it. It is basically meaningless in a typed language to
try to interpret the data of one type as though it were a different
type, just because it happens to take the same number of bytes in its
representation. Very occasionally you want to go to a lower level with
C or C++, and look at the underlying data - that's why the language
gives you facilities like memcpy and type-punning unions. But these are
for special circumstances.

It's fine if you want to make your own language typeless or type-unsafe.
But that is not how C or C++ work.

David Brown

unread,

Apr 18, 2018, 5:44:30 PM4/18/18

to

On 18/04/18 22:27, Rick C. Hodgin wrote:
> On 4/18/2018 4:22 PM, David Brown wrote:
>> On 18/04/18 20:17, Rick C. Hodgin wrote:
>>> It's not zero-overhead. Your proposed memcpy() is iterative, and
>>> operates on the data on a byte-by-byte basis.
>>
>> Logically, yes, memcpy() is byte for byte. In practice, good
>> compilers will optimise memcpy() very nicely when the operands are
>> appropriate. In a case like this, a good compiler (with optimisation
>> enabled, obviously) will do a single 64-bit load on an architecture
>> that supports unaligned loads. It will do its best in other cases -
>> using byte-for-byte loads if needed, or bigger loads if the compiler
>> has some information about the alignment.
>>
>> So the memcpy() solution will be as fast as your version on any target
>> that allows unaligned loads, and /correct/ on all targets regardless
>> of optimisations, flags, compiler variations, etc.
>
> I didn't realize he was simply talking about copying the 8 bytes
> from data to &d, and then using them that way.
>
> And, as you'll see I indicated in a later post, I don't understand
> how a compiler can look at a memcpy() of 8 bytes from the same source
> as a cast pointer dereference copy of the same 8 bytes can be UB in
> one case, and not UB in another.

Okay, let's break that down into steps. First, do you understand how
the compiler can take what is logically a call to a function called
memcpy(), and (when appropriate) turn it into a single 64-bit load? In
many cases, I have seen memcpy() removed altogether and data simply kept
in registers.

Secondly, do you understand that C and C++ do not allow you to access
data of one type through a pointer to a different type, except in
certain specific circumstances? (I am not asking if you think this is a
good idea - merely if you understand the meaning of the text in the
standards. For C, it is at 6.5p7.) And that a key case here is that
you can access any data via a character type pointer.

In that case, you should understand that you are not allowed to access
the character data (from the const char* pointer) as a completely
different type. But you /are/ allowed to access it via a char* pointer
- and you are also allowed to access the uint64_t variable via a char*
pointer. Therefore, the memcpy is allowed but accessing via the cast is
not.

>
> It shows how close behavior is to typing in C++.

Typing is important in C, and critical in C++.

> It is far more re-
> laxed in CAlive. Anything you throw at the compiler, so long as it
> is a syntatically valid expression that can operate on those types
> as indicated, it will compile and run as you expect it to. It's up
> to you, the developer, to know of any special cases where the actual
> underlying data may be invalid.

C and C++ are designed to make it difficult to get your types wrong. It
would arguably have been better if pointer casts like this were not
allowed, leading to compile-time errors for this sort of thing.
Unfortunately you simply have to know.

Rick C. Hodgin

unread,

Apr 18, 2018, 5:56:48 PM4/18/18

to

Yes. I still do not see it. It's all about constraints imposed upon
otherwise legal and valid data movement by a typing system.

CAlive removes that constraint and limitation completely.

--
Rick C. Hodgin

Rick C. Hodgin

unread,

Apr 18, 2018, 6:01:51 PM4/18/18

to

On 4/18/2018 5:44 PM, David Brown wrote:
> In that case, you should understand that you are not allowed to access
> the character data (from the const char* pointer) as a completely
> different type. But you /are/ allowed to access it via a char* pointer
> - and you are also allowed to access the uint64_t variable via a char*
> pointer. Therefore, the memcpy is allowed but accessing via the cast is
> not.

I recognize it as a limitation to the C++ compiler. I do not
recognize it as any kind of fundamental limitation. It is a
limitation imposed upon otherwise valid data moves for the sake
and limitations of the compiler.

>> It shows how close behavior is to typing in C++.
> Typing is important in C, and critical in C++.

Typing is necessary in CAlive for data manipulation and use,
but CAlive recognizes that the underlying data is data, and
allows it to be accessed at all times as such.

> C and C++ are designed to make it difficult to get your types wrong. It
> would arguably have been better if pointer casts like this were not
> allowed, leading to compile-time errors for this sort of thing.
> Unfortunately you simply have to know.

CAlive does not let you write code that violates types either. But
it allows a re-cast of something to be known from that point forward
to be a re-cast of that type, and then treats whatever is there at
that memory location as such.

I think type constraints of the forms seen in this thread are quite
ludicrous. They belie the true nature of the data operation under
the guise of following protocol.

I think it's a weakness in the language, and one I am glad to leave
behind.

--
Thank you! | Indianapolis, Indiana | God is love -- 1 John 4:7-9
Rick C. Hodgin | http://www.libsf.org/ | http://tinyurl.com/yaogvqhj
-------------------------------------------------------------------------
Software: LSA, LSC, Debi, RDC/CAlive, ES/1, ES/2, VJr, VFrP, Logician
Hardware: Arxoda Desktop CPU, Arxita Embedded CPU, Arlina Compute FPGA

David Brown

unread,

Apr 18, 2018, 6:16:31 PM4/18/18

to

On 19/04/18 00:01, Rick C. Hodgin wrote:
> On 4/18/2018 5:44 PM, David Brown wrote:
>> In that case, you should understand that you are not allowed to access
>> the character data (from the const char* pointer) as a completely
>> different type. But you /are/ allowed to access it via a char*
>> pointer - and you are also allowed to access the uint64_t variable via
>> a char* pointer. Therefore, the memcpy is allowed but accessing via
>> the cast is not.
>
> I recognize it as a limitation to the C++ compiler.

You mean a limitation of the C++ (and C) language, not the compiler.

> I do not
> recognize it as any kind of fundamental limitation. It is a
> limitation imposed upon otherwise valid data moves for the sake
> and limitations of the compiler.
>
>>> It shows how close behavior is to typing in C++.
>> Typing is important in C, and critical in C++.
>
> Typing is necessary in CAlive for data manipulation and use,
> but CAlive recognizes that the underlying data is data, and
> allows it to be accessed at all times as such.
>
>> C and C++ are designed to make it difficult to get your types wrong.
>> It would arguably have been better if pointer casts like this were not
>> allowed, leading to compile-time errors for this sort of thing.
>> Unfortunately you simply have to know.
>
> CAlive does not let you write code that violates types either. But
> it allows a re-cast of something to be known from that point forward
> to be a re-cast of that type, and then treats whatever is there at
> that memory location as such.
>
> I think type constraints of the forms seen in this thread are quite
> ludicrous. They belie the true nature of the data operation under
> the guise of following protocol.
>
> I think it's a weakness in the language, and one I am glad to leave
> behind.
>

I would say it is a strength in the language, not a weakness. (The
weakness is not getting enough automated help in spotting aliasing
errors - but you know you are doing something dodgy whenever you are
casting pointers to something other than character pointers, so you know
where to be careful.) C and C++ do let you access the underlying data,
but discourage it - and the compiler can assume you are not breaking the
rules.

There is a place for languages with weak typing and a "data is data"
attitude. Assembly is a fine example.

I realise it is highly unlikely that I'd be able to persuade you that
strong typing is a good idea and that limiting low-level manipulation
and access of data outside the type system is important to keeping the
types strong. It is good enough for now if you understand how C and C++
work here.

What you choose for your own language is up to you.

Tim Rentsch

unread,

Apr 18, 2018, 11:42:01 PM4/18/18

to

wyn...@gmail.com writes:

> Paavo Helde writes
>
>> On 17.04.2018 12:08, wyn...@gmail.com wrote:
>>
>>> Can is_valid2(..) replace is_valid(..)?
>>> I'm concerned about alignment and endian issues on different CPU.
>>>
>>> // data always points to character sequence of length >=8
>>> bool is_valid(const char* data)
>>> {
>>> return (!(data[0]&'\x80'))&&
>>> (!(data[1]&'\x80'))&&
>>> (!(data[2]&'\x80'))&&
>>> (!(data[3]&'\x80'))&&
>>> (!(data[4]&'\x80'))&&
>>> (!(data[5]&'\x80'))&&
>>> (!(data[6]&'\x80'))&&
>>> (!(data[7]&'\x80'));
>>> };
>>>
>>> bool is_valid2(const char* data)
>>> {
>>> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
>>> };
>>
>> Alignment mismatch would be a real danger on some platforms.
>>
>> What about this replacement which is also basically one-liner and does
>> not suffer from alignment issues:

>>
>> bool is_valid(const char* data) {
>> return std::find_if(data, data+8,
>> [](char c) {return c&'\x80';})==data+8;
>> }
>

> Quite some features added since C++11 seemed unsurely necessary to me
> I have been busying figuring C++ before C++11 till now.
> In this case, I'm not sure if this lambda expression can beat memcpy.
>
> In general, learning newer C++ features means one has to spend more
> time finding 'bugs' than solving realistic problems (If one studies
> or pratices C++ deeply enough)

This answer does have the nice property that it is guaranteed to
have the same semantics as the original function. None of the
other methods suggested do.

What I think is probably more relevant is the expected context of
application. I suspect this function is meant to be used only in
a rather narrow context of application. If that is so, I think
it's worth a look at the context to see if it can be jiggered
around so the functionality desired can be provided more easily
or more cleanly. For example, if this is being called in a loop,
always 8 bytes at a time starting from the beginning of an array,
you might be able to change the buffer to one having uint64_t
elements (but filled using char *'s, which is allowed). Then you
can use the uint64_t elements directly (assuming I understand
what it is you want to do).

Tim Rentsch

unread,

Apr 19, 2018, 5:14:57 AM4/19/18

to

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> writes:

> On Wed, 18 Apr 2018 14:17:06 -0400, **** <redacted> wrote:
>
>> On 4/18/2018 1:56 PM, Chris Vine wrote:
>

> [snip]

>
>>> You seem to be clueless. What
>>> makes your attitude even more ridiculous is that there is a
>>> zero-overhead way of doing it right.
>>

>> It's not zero-overhead. Your proposed memcpy() is iterative, and

>> operates on the data on a byte-by-byte basis. It is slower than

>> the proposal by the OP, and while yours may be conforming ... who
>> cares if his faster method works? If his goal is to be expressly
>> conforming, then it matters. But if he's targeting a range of
>> tools where it will work using is_valid2() ... then honestly, who
>> cares? Every C++ compiler is different and these things can be
>> wrangled into tests and validated at startup with the simple load
>> of a test case library that calls some functions included in the
>> main executable.
>

> It is not iterative in the sense of a loop. It is an intrinsic/built-in
> which will do a block byte transfer at worst, and if your
> reinterpret_cast does not fail on alignment grounds will be elided
> entirely.
>

> So you propose to recommend undefined behaviour on the ground that the
> one compiler you have tested your reinterpret_cast with (VS) gives the
> correct results, but not to adopt defined behaviour even though (i)
> every compiler you test it with will elide the memcpy away if your
> reinterpret_cast can possibly work, to produce optimal code, and (ii)
> every update to VS or change to the program compile parameters may
> break your version.
>
> The correct version will be just as fast as your incorrect version, if
> compiled with -O or higher and your version actually works.
>
> Your view is deranged.

Are you the same Chris Vine who posted message quoted below?
Or am I confused?

# Unfortunately there are a number of people who seem equally addicted
# to replying to ****'s posts, thus giving him a further chance to spam
# this newsgroup. Replying to on topic posts seems just as bad from
# this point of view as responding to his off topic posts.

bartc

unread,

Apr 19, 2018, 5:43:30 AM4/19/18

to

On 18/04/2018 21:02, Rick C. Hodgin wrote:

> Data is data. It should be viewed as such. I believe that is the
> correct philosophy to look at things like this, and any standard
> which does not correlate something like a cast pointer copying 8
> bytes, compared to a memcpy() which copies 8 bytes, is the insane
> component of that discussion.

[Haven't read the rest of the thread.]

I would hope that the assignment and memcpy do different things here:

int64 a;
float64 x = 1.0;

a = x;
memcpy(&a, &x, 8);

--
bartc

Jorgen Grahn

unread,

Apr 19, 2018, 6:08:42 AM4/19/18

to

On Tue, 2018-04-17, David Brown wrote:

> On 17/04/18 11:08, wyn...@gmail.com wrote:
>> Can is_valid2(..) replace is_valid(..)?
>> I'm concerned about alignment and endian issues on different CPU.
>>
>> // data always points to character sequence of length >=8
>> bool is_valid(const char* data)
>> {
>> return (!(data[0]&'\x80'))&&
>> (!(data[1]&'\x80'))&&
>> (!(data[2]&'\x80'))&&
>> (!(data[3]&'\x80'))&&
>> (!(data[4]&'\x80'))&&
>> (!(data[5]&'\x80'))&&
>> (!(data[6]&'\x80'))&&
>> (!(data[7]&'\x80'));
>> };

This one might be faster if you avoid &&, with its short-circuiting
properties. Maybe something like:

bool is_valid(const char* data)
{
auto valid = [data](unsigned n) { ... };
return valid(0) + valid(1) + ... valid(7) == 8;

};

>> bool is_valid2(const char* data)
>> {
>> return *reinterpret_cast<const uint64_t*>(data)&0x8080808080808080L;
>> };
>>
>

> Endian issues are not going to be a problem. C and C++ allow for a lot
> of flexibility in the representation of integer types, but uint64_t (and
> similar types) are far stricter.

But it would have been an issue if the pattern wasn't "same value in
all bytes".

> But alignment /will/ be a problem on some platforms.

Yes.

Personally I recommend staying away from reinterpret_cast (and of
course the old-style C cast). It's too easy to code yourself into
endianness, alignment and aliasing problems.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Rick C. Hodgin

unread,

Apr 19, 2018, 6:10:23 AM4/19/18

to

The example in this thread is more like:

a = *(uint64_t*)&x;

In that case you do want the underlying bits exactly as they are.

--
Rick C. Hodgin

Jorgen Grahn

unread,

Apr 19, 2018, 6:23:58 AM4/19/18

to

On Wed, 2018-04-18, wyn...@gmail.com wrote:
...
> Compiler: g++ v7.3.1
> CPU : Intel i5-6500
>
> g++ test.cpp
> Case1 (all valid) : avg=1.2975 faster (than is_valid(..))
> case2 (8 inavalid, 1 invalid): avg=0.5086 faster (slower)
>
> g++ test.cpp -O2
> Case1 (all valid) : avg=1.8934 faster
> Case2 (8 invalid, 1 invalid): avg=1.0723 faster
...
>
> [Conclusion] memcpy implement should compile with -O2 to gain
> advantage.

It is pointless to measure the performance of code built with
optimization disabled. The advice you got from people probably
assumed -O2 or better optimization.

wyn...@gmail.com

unread,

Apr 19, 2018, 7:40:57 AM4/19/18

to

Tim Rentsch於 2018年4月19日星期四 UTC+8上午11時42分01秒寫道：

Yes, it is. But I want it faster, so is_valid2(..) is questioned.

Rick C. Hodgin

unread,

Apr 19, 2018, 9:24:42 AM4/19/18

to

On 4/19/2018 7:40 AM, wyn...@gmail.com wrote:
> ...I want it faster, so is_valid2(..) is questioned.

From what I understand, you need to use this form of the function
to be fully C++ compliant without any undefined behavior:

bool is_valid2(const char* data)
{
uint64_t d;
memcpy(&d, data, 8);
return !(d & 0x8080808080808080L);
}

The use of memcpy() ensures the value in d is valid in C++.

--
Rick C. Hodgin

Tim Rentsch

unread,

Apr 19, 2018, 9:32:53 AM4/19/18

to

wyn...@gmail.com writes:

Like I said, looking at the call sites with an eye towards
making external adjustments seems like a good way to get
what you want in this case.

james...@verizon.net

unread,

Apr 19, 2018, 2:03:36 PM4/19/18

to

On Wednesday, April 18, 2018 at 4:14:02 PM UTC-4, Rick C. Hodgin wrote:
> On 4/18/2018 3:59 PM, james...@verizon.net wrote:
> > On Wednesday, April 18, 2018 at 3:45:07 PM UTC-4, Rick C. Hodgin wrote:

...

> >> My point is look at what your compiler generates and if it generates
> >> proper code ... use it.
> >
> > I barely have time to design, write, and test my code. Checking
> > whether the generated code is "proper" would add weeks of delay to
> > every delivery I made. I wonder, just how small are the programs
> > that you write that you can afford the time needed to review the
> > generated code?
>
> Pretty small. 10s of thousands of lines per app max.
>
> It's not that I review the generated code, it's more when I encounter
> something and I'm not sure how it will work in the compiler, I test it.
> I typically will test things in MSVC++ and GCC (MinGW).

That's the key difference. As long as I know how it will work in the C
code, I couldn't care less how it works in the generated code, so long
as it produces the result that the C code is supposed to produce. If I
don't know how it will work in the C code, that's a very bad sign,
because the only relevant things I don't know about all involve behavior
that is, at best, implementation-defined (and our contract with our
client prohibits me from delivering code with observable implementation-
defined behavior). At worst, it may involve undefined behavior.

Rick C. Hodgin

unread,

Apr 19, 2018, 2:19:10 PM4/19/18

to

There are very few things I don't understand regarding how they will
work in C (and more so in C++ because I never have learned C++ but only
seen code samples and it made sense to me), but that wasn't always the
case. I came from an assembly background to C, and I've made lots of
assumptions that were valid in asm, and completely invalid in C/C++.

It's actually completely tailored my whole view of data toward the
compiler, hence CAlive removing limitations on type punning in every
form.

Whatever lens you point at data in CAlive, that's how it will be
viewed no matter what was there before. You use the right tool for
the right job, and you (the developer) are able to keep track of
things the compiler could never know.

--
Rick C. Hodgin

Juha Nieminen

unread,

Apr 23, 2018, 2:25:57 AM4/23/18

to

wyn...@gmail.com wrote:
>> If the pointer is not aligned to a 64-bit integer boundary, it will cause
>> a hardware interrupt to be triggered in some architectures (which in
>> practice means the program will crash). With more permissive architectures
>> it will probably cause a slowdown (of some clock cycles).
>>
>
> Thanks for the information. I know Intel x86 compatible CPU should be
> OK and C++ standard says not OK. But C++'s words implies nearly all
> kinds of CPU. In my cases, I just worry about CPU that can address say
> 2-Gbytes of space, and see if there is chance is_valid2(..) can work.

Btw, if you want a concrete example where the program will crash because
of reading a word breaking alignment boundaries, the UltraSparc processor
is such an example. (It's actually very common in RISC processors, as
they tend to have simplified memory access for efficiency, and will not
have specialized logic to read words across alignment boundaries, and
will simply trigger an interrupt if that's attempted.)