Union type punning in C++

Mr Flibble

unread,

Jan 1, 2020, 2:24:19 PM1/1/20

to

Undefined behaviour aside is C++ union type punning an acceptable hack or not, given that it is legal in C?
If it works on all the compilers I care about does it really matter that it is *currently* classed as undefined behaviour?
I would say that everyone has "secretly" fucking done it some time or other so the C++ Standards Committee should get there act together and legalize it like it has been in C.
Can anyone provide a straight answer to this?

/Flibble

--
"Snakes didn't evolve, instead talking snakes with legs changed into snakes." - Rick C. Hodgin

“You won’t burn in hell. But be nice anyway.” – Ricky Gervais

“I see Atheists are fighting and killing each other again, over who doesn’t believe in any God the most. Oh, no..wait.. that never happens.” – Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are confronted by God," Byrne asked on his show The Meaning of Life. "What will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a world that is so full of injustice and pain. That's what I would say."

bol...@nowhere.org

unread,

Jan 2, 2020, 4:37:02 AM1/2/20

to

On Wed, 1 Jan 2020 19:24:26 +0000
Mr Flibble <flibbleREM...@i42.co.uk> wrote:
>Undefined behaviour aside is C++ union type punning an acceptable hack or not,
>given that it is legal in C?

Using a union or otherwise, type punning in a necessity when doing low level
programming. If C++ is ever to be used in this scenario then it has to be
Supported.

Öö Tiib

unread,

Jan 2, 2020, 6:34:15 AM1/2/20

to

On Wednesday, 1 January 2020 21:24:19 UTC+2, Mr Flibble wrote:
> Undefined behaviour aside is C++ union type punning an acceptable hack or not, given that it is legal in C?
> If it works on all the compilers I care about does it really matter that it is *currently* classed as undefined behaviour?
> I would say that everyone has "secretly" fucking done it some time or other so the C++ Standards Committee should get there act together and legalize it like it has been in C.
> Can anyone provide a straight answer to this?

To me it seems to be unneeded feature.

1) Experiments of mine and others have shown that usage of memcpy
instead of other ways of type punning tends to generate better or
same codes. It is hard to find examples where other methods are
really beneficial. Can the "defacto" gang provide any? No? I thought
so.
2) Compiler writers say that type punning with union breaks some
aliasing optimizations so they have to pessimize just for a case.
I haven't studied all details of such claims but as they are very highly
skilled people I trust them.
3) Whole pile of different methods is unneeded for doing low level
bit hacks. By Principle of Least Astonishment it is advisable to do
those hacks always using same idioms even when there are multiple
different ways available.

So whatever way the committees of C and C++ decide I will continue
to use memcpy myself for type punning and I will continue to suggest
that as rule in coding standards of projects where I participate.

That old blog says basically same thing as me:
<https://blog.regehr.org/archives/959>

Bo Persson

unread,

Jan 2, 2020, 6:49:39 AM1/2/20

to

The C++ committee seems unwilling to change the union rules and so
instead adds a bit_cast function to package the memcpy usage with a
slightly better name:

template< class To, class From >
constexpr To bit_cast(const From& from) noexcept;

https://en.cppreference.com/w/cpp/numeric/bit_cast

Bo Persson

Öö Tiib

unread,

Jan 2, 2020, 7:52:53 AM1/2/20

to

On Thursday, 2 January 2020 13:49:39 UTC+2, Bo Persson wrote:
>
> The C++ committee seems unwilling to change the union rules and so
> instead adds a bit_cast function to package the memcpy usage with a
> slightly better name:
>
> template< class To, class From >
> constexpr To bit_cast(const From& from) noexcept;
>
> https://en.cppreference.com/w/cpp/numeric/bit_cast

Thanks, I didn't somehow even notice that. So we can throw away
our own type punning wrappers around memcpy in C++20.

bol...@nowhere.org

unread,

Jan 2, 2020, 12:22:52 PM1/2/20

to

You realise using a union (or a cast) in theory uses very few CPU cycles to do
type punning, whereas memcpy has the overhead of a function call + loop. Its
an extremely inefficient way to do it if you don't need the new type to be in
a seperate variable.

Öö Tiib

unread,

Jan 2, 2020, 1:26:46 PM1/2/20

to

That is the typical behavior of gibberish team. In one post it is

"de facto" in other "in theory". I wrote:

Experiments of mine and others have shown that usage of memcpy
instead of other ways of type punning tends to generate better or
same codes. It is hard to find examples where other methods are
really beneficial. Can the "defacto" gang provide any? No? I thought
so.

So drop your rubbish and show assemblers of realistic usages
on real compilers where your "theories" hold.

Bo Persson

unread,

Jan 2, 2020, 3:12:50 PM1/2/20

to

Obviously the bit_cast function will be inlined, as will the
__builtin_memcpy (or whatever it is called for a specific compiler).

And sizeof(To) and sizeof(From) are compile time constants, so no loop
needed to copy, for example, 4 or 8 bytes. Instead we can expect a
single register mov.

I have my favorite example from a std::string constructor using both
strlen and memcpy to construct the string. And the compiler optimizes
this to 1 and 2 mov-instructions, respectively:

std::string whatever = "abcdefgh";
000000013FCC30C3 mov rdx,qword ptr [string "abcdefgh"]
000000013FCC30CA mov qword ptr [whatever],rdx
000000013FCC30D2 mov byte ptr [rsp+347h],0
000000013FCC30DA mov qword ptr [rsp+348h],8
000000013FCC30E6 mov byte ptr [rsp+338h],0

"Here traits_type::copy contains a call to memcpy, which is optimized
into a single register copy of the whole string (carefully selected to
fit). The compiler also transforms a call to strlen into a compile time 8."

https://stackoverflow.com/a/11639305

Note that this example is from 8 years ago. Compilers haven't exactly
become less optimizing since then.

Bo Persson

peter koch

unread,

Jan 2, 2020, 3:39:50 PM1/2/20

to

onsdag den 1. januar 2020 kl. 20.24.19 UTC+1 skrev Mr Flibble:
> Undefined behaviour aside is C++ union type punning an acceptable hack or not, given that it is legal in C?
> If it works on all the compilers I care about does it really matter that it is *currently* classed as undefined behaviour?
> I would say that everyone has "secretly" fucking done it some time or other so the C++ Standards Committee should get there act together and legalize it like it has been in C.
> Can anyone provide a straight answer to this?
>

It does not work with maximum optimization at least with gcc, and it has not done so for at least six years. I know from bitter personal experience (converting from IBM to IEEE floating point).
Instead, just use std::bit_cast - implement it yourself, it is ten lines of code all included. If you compile with optimizations on, the resulting code has no overhead whatsoever.

Chris Vine

unread,

Jan 2, 2020, 4:54:41 PM1/2/20

to

More to the point, in gcc/clang memcpy is a built-in, and in VS an
"intrinsic", and in most usages does not involve a function call at all
(unless our poster does something like pass it as a function pointer),
as your experiments show. In C++ it is the recommended alternative to
punning with a union. However, I think you are wasting your time
arguing with this guy: from his repetitive posts, he is either dim or a
dick, possibly both.

David Brown

unread,

Jan 2, 2020, 5:23:53 PM1/2/20

to

memcpy of a known small size will be handled inline by most decent
compilers. Generally it does not lead to any memory accesses or code
other than register movement.

#include <string.h>

float swapsign1(float x) {
union { float f; unsigned int i; } u;
u.f = x;
u.i ^= 0x80000000;
return u.f;
}

float swapsign2(float x) {
unsigned int i;
memcpy(&i, &x, sizeof(float));
i ^= 0x80000000;
memcpy(&x, &i, sizeof(float));
return x;
}

These give identical code with gcc. Both versions are valid in C, but
only swapsign2 is defined by the C++ standards (though gcc defines the
behaviour as an extension).

Chris M. Thomasson

unread,

Jan 2, 2020, 6:39:54 PM1/2/20

to

On 1/1/2020 11:24 AM, Mr Flibble wrote:
> Undefined behaviour aside is C++ union type punning an acceptable hack
> or not, given that it is legal in C?
> If it works on all the compilers I care about does it really matter that
> it is *currently* classed as undefined behaviour?
> I would say that everyone has "secretly" fucking done it some time or
> other so the C++ Standards Committee should get there act together and
> legalize it like it has been in C.
> Can anyone provide a straight answer to this?

Is there some cryptic hidden thing in the standard that just might
"allow" for the following code to run without ub?
___________________________
#include <iostream>

struct base
{
int p0;
};

struct object_0
{
base b;
int p0;
};

struct object_1
{
base b;
int p0;
int p1;
};

union object
{
base b;
object_0 o0;
object_1 o1;
};

int main() {
object o;

o.b.p0 = 41;
o.o0.b.p0 += 1;
o.o1.b.p0 += 1;

// humm
int r = o.b.p0; // ub...

std::cout << "r = " << r << "\n";

return 0;
}
___________________________

Wrt POD and a common base object on each sub-object in a union, well
does it bite the dust with UB?

Öö Tiib

unread,

Jan 2, 2020, 7:58:13 PM1/2/20

to

On Friday, 3 January 2020 01:39:54 UTC+2, Chris M. Thomasson wrote:
> On 1/1/2020 11:24 AM, Mr Flibble wrote:
> > Undefined behaviour aside is C++ union type punning an acceptable hack
> > or not, given that it is legal in C?
> > If it works on all the compilers I care about does it really matter that
> > it is *currently* classed as undefined behaviour?
> > I would say that everyone has "secretly" fucking done it some time or
> > other so the C++ Standards Committee should get there act together and
> > legalize it like it has been in C.
> > Can anyone provide a straight answer to this?
>
> Is there some cryptic hidden thing in the standard that just might
> "allow" for the following code to run without ub?

Possibly.

> ___________________________
> #include <iostream>
>
> struct base
> {
> int p0;
> };

Standard layout classes have lot of additional guarantees.

> struct object_0
> {
> base b;
> int p0;
> };

Standard layout with common initial sequence with base.

> struct object_1
> {
> base b;
> int p0;
> int p1;
> };

Same.

> union object
> {
> base b;
> object_0 o0;
> object_1 o1;
> };

Standard layout union with members that have common initial
sequence.

> int main() {
> object o;
>
> o.b.p0 = 41;

That o.b is therefore active member.

> o.o0.b.p0 += 1;

It is valid to read from inactive member o0 but I am unsure if
write by += now makes it active. Standard layout aggregate types
have piles of privileges. It is dangerous leg gun in C++.
Some novice can see idiom of usage of such privileges and shoot
off his leg elsewhere.

> o.o1.b.p0 += 1;

Same here.

>
> // humm
> int r = o.b.p0; // ub...

Reading from common initial sequence of non-active standard layout member
is certainly not undefined behavior.

>
> std::cout << "r = " << r << "\n";
>
> return 0;
> }
> ___________________________
>
>
> Wrt POD and a common base object on each sub-object in a union, well
> does it bite the dust with UB?

By my understanding it can easily be that it is not undefined behavior
there at all. What is the motivation of writing code like that? Does it
cause issues with some compilers? Very strange code can sometimes find
its way to territory of defects of compilers. Compilers are written
by humans too.

Chris M. Thomasson

unread,

Jan 2, 2020, 8:56:03 PM1/2/20

to

On 1/2/2020 4:58 PM, Öö Tiib wrote:
> On Friday, 3 January 2020 01:39:54 UTC+2, Chris M. Thomasson wrote:
>> On 1/1/2020 11:24 AM, Mr Flibble wrote:
>>> Undefined behaviour aside is C++ union type punning an acceptable hack
>>> or not, given that it is legal in C?
>>> If it works on all the compilers I care about does it really matter that
>>> it is *currently* classed as undefined behaviour?
>>> I would say that everyone has "secretly" fucking done it some time or
>>> other so the C++ Standards Committee should get there act together and
>>> legalize it like it has been in C.
>>> Can anyone provide a straight answer to this?
>>
>> Is there some cryptic hidden thing in the standard that just might
>> "allow" for the following code to run without ub?
>
> Possibly.

[...]

> By my understanding it can easily be that it is not undefined behavior
> there at all. What is the motivation of writing code like that?

Mainly to boil things down for the example. Also, one can use the base
object as a, say, sequence counter. The types in the union can be used
for, an event dispatch.

Say the union object is being represented as an object_1.

object_1* o1 = &o.o1;

o1->b.p0 += 1;
o1->p0 += 1;
o1->p1 += 3;
o1->b.p0 += 1;

Now, we can read the sequence count base now matter if we are working
with an object_0 or object_1.

int seq = o.b.p0;

Öö Tiib

unread,

Jan 3, 2020, 3:40:51 AM1/3/20

to

On Friday, 3 January 2020 03:56:03 UTC+2, Chris M. Thomasson wrote:
> On 1/2/2020 4:58 PM, Öö Tiib wrote:
> > On Friday, 3 January 2020 01:39:54 UTC+2, Chris M. Thomasson wrote:
> >> On 1/1/2020 11:24 AM, Mr Flibble wrote:
> >>> Undefined behaviour aside is C++ union type punning an acceptable hack
> >>> or not, given that it is legal in C?
> >>> If it works on all the compilers I care about does it really matter that
> >>> it is *currently* classed as undefined behaviour?
> >>> I would say that everyone has "secretly" fucking done it some time or
> >>> other so the C++ Standards Committee should get there act together and
> >>> legalize it like it has been in C.
> >>> Can anyone provide a straight answer to this?
> >>
> >> Is there some cryptic hidden thing in the standard that just might
> >> "allow" for the following code to run without ub?
> >
> > Possibly.
> [...]
> > By my understanding it can easily be that it is not undefined behavior
> > there at all. What is the motivation of writing code like that?
>
> Mainly to boil things down for the example. Also, one can use the base
> object as a, say, sequence counter. The types in the union can be used
> for, an event dispatch.

Hmm. I'm not sure I understand the example problem. What the counter
counts and why you do not keep it outside of union?

> Say the union object is being represented as an object_1.
>
> object_1* o1 = &o.o1;
>
> o1->b.p0 += 1;
> o1->p0 += 1;
> o1->p1 += 3;

Above looks ill-formed, diagnostic required.
The p0 and p1 were of class types without operator+=(int) specified.

> o1->b.p0 += 1;
>
> Now, we can read the sequence count base now matter if we are working
> with an object_0 or object_1.
>
> int seq = o.b.p0;

I'm still confused. This is all about some kind of aliases and references
(that likely work fine). But existence of those does simplify nothing for
me. There is apparent desire to have multiple members of union active
at same time. That also does confuse me. For me it is not question if it
is defined behavior but uncertainty about why I need it regardless if it
is defined behavior or not.

bol...@nowhere.org

unread,

Jan 3, 2020, 4:55:29 AM1/3/20

to

On Thu, 2 Jan 2020 10:26:35 -0800 (PST)

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Thursday, 2 January 2020 19:22:52 UTC+2, bol...@nowhere.org wrote:
>> On Thu, 2 Jan 2020 04:52:42 -0800 (PST)
>> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> >On Thursday, 2 January 2020 13:49:39 UTC+2, Bo Persson wrote:
>> >>
>> >> The C++ committee seems unwilling to change the union rules and so
>> >> instead adds a bit_cast function to package the memcpy usage with a
>> >> slightly better name:
>> >>
>> >> template< class To, class From >
>> >> constexpr To bit_cast(const From& from) noexcept;
>> >>
>> >> https://en.cppreference.com/w/cpp/numeric/bit_cast
>> >
>> >Thanks, I didn't somehow even notice that. So we can throw away
>> >our own type punning wrappers around memcpy in C++20.
>>
>> You realise using a union (or a cast) in theory uses very few CPU cycles to
>do
>> type punning, whereas memcpy has the overhead of a function call + loop. Its
>> an extremely inefficient way to do it if you don't need the new type to be in
>
>> a seperate variable.
>
>That is the typical behavior of gibberish team. In one post it is
>"de facto" in other "in theory". I wrote:
>
>Experiments of mine and others have shown that usage of memcpy

Lets see your test code then.

David Brown

unread,

Jan 3, 2020, 5:20:33 AM1/3/20

to

You could look at the code I posted for an example of identical union
and memcpy generated code.

Öö Tiib

unread,

Jan 3, 2020, 5:21:38 AM1/3/20

to

You first. You demonstrated lack of knowledge with the "overhead of a
function call + loop" so now please have balls to stand behind your
words of nonsense or to admit your miserable ignorance and obtain
that clue bat Mr Flibble keeps suggesting. More nonsense does not work
anymore. Besides Bo Persson and David Brown posted some and so it
is likely waste of effort to dig out and add tests of mine as well.

Mr Flibble

unread,

Jan 3, 2020, 9:04:31 AM1/3/20

to

I need this ability so I can have a swizzle class.

David Brown

unread,

Jan 3, 2020, 9:24:31 AM1/3/20

to

On 03/01/2020 15:04, Mr Flibble wrote:

>
> I need this ability so I can have a swizzle class.
>

What exactly do you mean by "swizzle class", and what are you trying to do?

I am entirely confident that you do /not/ need union type punning to
achieve your goals. But it is possible that it would simpler, clearer
and more efficient than standards-compliant solutions. (This is fine,
of course, if you are happy to restrict your code to compilers for which
the code is valid.) Without more details of what you are trying to do,
I don't see how anyone can help - another thread trying to convince
people to understand what unions can and cannot do in standard C++ is
not particularly useful.

bol...@nowhere.org

unread,

Jan 3, 2020, 9:46:34 AM1/3/20

to

On Thu, 2 Jan 2020 21:12:40 +0100
Bo Persson <b...@bo-persson.se> wrote:
>"Here traits_type::copy contains a call to memcpy, which is optimized
>into a single register copy of the whole string (carefully selected to
>fit). The compiler also transforms a call to strlen into a compile time 8."

Great. And if the string is say 150 bytes long?

Öö Tiib

unread,

Jan 3, 2020, 9:47:33 AM1/3/20

to

Ok. Then you need to make your swizzle class "standard layout class"
([class] 7) with all of non-static data members being "common initial
sequence".

class.mem] 23
"In a standard-layout union with an active member (12.3) of struct
type T1, it is permitted to read a non-static data member m of
another union member of struct type T2 provided m is part of the
common initial sequence of T1 and T2; the behavior is as if the
corresponding member of T1 were nominated."

And from [class.union] I have strong impression that writing into
such union member just ends lifetime of previous active member
and starts lifetime of written into active member without any
cleanup or initializations done.

bol...@nowhere.org

unread,

Jan 3, 2020, 9:49:22 AM1/3/20

to

On Thu, 2 Jan 2020 21:54:05 +0000
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>More to the point, in gcc/clang memcpy is a built-in, and in VS an

Built-in doesn't mean loop free and no stack push/pops. It just means the
compiler doesn't have to link in code from a library but can generate it on
the spot.

>arguing with this guy: from his repetitive posts, he is either dim or a
>dick, possibly both.

Ad Hominem seems to be a common theme from you assclowns when you can't
actually argue your point.

bol...@nowhere.org

unread,

Jan 3, 2020, 9:55:21 AM1/3/20

to

On Fri, 3 Jan 2020 02:21:28 -0800 (PST)

You made the claim so either back it up or fuck off back under your bridge.

Mr Flibble

unread,

Jan 3, 2020, 9:57:50 AM1/3/20

to

No that won't help me with my swizzle class.

Bonita Montero

unread,

Jan 3, 2020, 10:04:20 AM1/3/20

to

>> You first. You demonstrated lack of knowledge with the "overhead of a

> You made the claim so either back it up or fuck off back under your bridge.

I just did

#include <cstring>
#include <cstdint>

uint64_t native( double d )
{
#if defined(UNION)
union
{
double du;
uint64_t dx;
};
du = d;
return dx;
#else
uint64_t n;
memcpy( &n, &d, sizeof(n) );
return n;
#endif
}

With gcc and without UNION:

_Z6natived:
movq %xmm0, %rax
ret

with UNION:

_Z6natived:
movq %xmm0, %rax
ret

With MSVC and without UNION:

?native@@YA_KN@Z PROC
movsd QWORD PTR [rsp+8], xmm0
mov rax, QWORD PTR d$[rsp]
ret 0
?native@@YA_KN@Z ENDP

With UNION:

?native@@YA_KN@Z PROC
movsd QWORD PTR $S1$[rsp], xmm0
mov rax, QWORD PTR $S1$[rsp]
ret 0
?native@@YA_KN@Z ENDP

Öö Tiib

unread,

Jan 3, 2020, 10:35:00 AM1/3/20

to

Being vulgar makes you to look only more pathetic. You started by
demonstrating lack of knowledge with the "overhead of a
function call + loop" first. So now please have balls to stand behind

your words of nonsense or to admit your miserable ignorance and

obtain that clue bat Mr Flibble keeps suggesting. Sad obscenities do
not help you. :D

Öö Tiib

unread,

Jan 3, 2020, 10:38:15 AM1/3/20

to

Why? Can it be rewritten in a way that will help with it or are
there some fundamental problems?

David Brown

unread,

Jan 3, 2020, 11:12:04 AM1/3/20

to

On 03/01/2020 15:49, bol...@nowhere.org wrote:
> On Thu, 2 Jan 2020 21:54:05 +0000
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>> More to the point, in gcc/clang memcpy is a built-in, and in VS an
>
> Built-in doesn't mean loop free and no stack push/pops. It just means the
> compiler doesn't have to link in code from a library but can generate it on
> the spot.

If your objects are big enough that they need to go on the stack instead
of registers, then yes - you will get stack usage. And if they are big
enough for a loop to make sense, you will get a loop. But if they are
small enough, as they often are, they will stay entirely in registers -
whether you use unions or memcpy.

Perhaps you think that because memcpy takes addresses as parameters, it
means the data has to be in ram? If so, then you are wrong.

Of course, you'd know this if you had bothered to try any examples -
such as the code I posted.

I recommend you go to <https://godbolt.org>. It is an online compiler,
with lots of different compilers (different versions of gcc, clang,
msvc, icc, etc.). Put in some code with memcpy of local variables,
compile with optimisation (gcc -O2 or msvc /O2 for example), and /learn/
something.

bol...@nowhere.org

unread,

Jan 3, 2020, 11:12:48 AM1/3/20

to

On Fri, 3 Jan 2020 07:34:50 -0800 (PST)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Friday, 3 January 2020 16:55:21 UTC+2, bol...@nowhere.org wrote:
>> You made the claim so either back it up or fuck off back under your bridge.
>
>Being vulgar makes you to look only more pathetic. You started by

Ooh get you!

>demonstrating lack of knowledge with the "overhead of a
>function call + loop" first. So now please have balls to stand behind
>your words of nonsense or to admit your miserable ignorance and
>obtain that clue bat Mr Flibble keeps suggesting. Sad obscenities do
>not help you. :D

So you're saying using memcpy() never creates a function call or loop
because its built in? Ok, here's what Clang has to say about it:

void func()
{
char s[100];
char d[100];
memcpy( &d, &s,100);
}

_func: ## @func
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $224, %rsp
movl $100, %eax
movl %eax, %edx
movq ___stack_chk_guard@GOTPCREL(%rip), %rcx
movq (%rcx), %rcx
movq %rcx, -8(%rbp)
leaq -224(%rbp), %rcx
leaq -112(%rbp), %rsi
movq %rcx, %rdi
callq _memcpy
movq ___stack_chk_guard@GOTPCREL(%rip), %rcx
movq (%rcx), %rcx
movq -8(%rbp), %rdx
cmpq %rdx, %rcx
jne LBB0_2

callq memcpy? Ooops, that looks like a call to me. Sure, if the length is
short it will optimise it out to just movs , but that criteria wasn't
mentioned.

Öö Tiib

unread,

Jan 3, 2020, 11:49:34 AM1/3/20

to

On Friday, 3 January 2020 18:12:48 UTC+2, bol...@nowhere.org wrote:
> On Fri, 3 Jan 2020 07:34:50 -0800 (PST)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
> >On Friday, 3 January 2020 16:55:21 UTC+2, bol...@nowhere.org wrote:
> >> You made the claim so either back it up or fuck off back under your bridge.
> >
> >Being vulgar makes you to look only more pathetic. You started by
>
> Ooh get you!
>
> >demonstrating lack of knowledge with the "overhead of a
> >function call + loop" first. So now please have balls to stand behind
> >your words of nonsense or to admit your miserable ignorance and
> >obtain that clue bat Mr Flibble keeps suggesting. Sad obscenities do
> >not help you. :D
>
> So you're saying using memcpy() never creates a function call or loop
> because its built in?

No, I did not say that. Are you straw-manning out of habit or did being pathetically profane affect your capability to read? I said that type
punning using memcpy does generate more correct code that is as efficient
or more efficient than other methods with modern (less than 15 years old)
C++ compilers when compiled with optimizations enabled.

Mr Flibble

unread,

Jan 3, 2020, 12:06:30 PM1/3/20

to

You can find my swizzle base class here:

https://github.com/i42output/neoGFX/blob/master/include/neogfx/core/swizzle_array.hpp

used by my swizzle class here:

https://github.com/i42output/neoGFX/blob/master/include/neogfx/core/swizzle.hpp

used by my vector class here:

https://github.com/i42output/neoGFX/blob/master/include/neogfx/core/numerical.hpp

Chris Vine

unread,

Jan 3, 2020, 12:55:46 PM1/3/20

to

On Fri, 3 Jan 2020 14:49:12 +0000 (UTC)
bol...@nowhere.org wrote:
> On Thu, 2 Jan 2020 21:54:05 +0000
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> >More to the point, in gcc/clang memcpy is a built-in, and in VS an
>
> Built-in doesn't mean loop free and no stack push/pops. It just means the
> compiler doesn't have to link in code from a library but can generate it on
> the spot.

You said "memcpy has the overhead of a function call". When used in
place of type punning it doesn't, all the code is inline, no new stack
frame is created and no call instruction is issued. You said "Its an

extremely inefficient way to do it if you don't need the new type to be

in a seperate variable". You are completely wrong and clueless.

> >arguing with this guy: from his repetitive posts, he is either dim or a
> >dick, possibly both.
>
> Ad Hominem seems to be a common theme from you assclowns when you can't
> actually argue your point.

"Ad Hominem seems to be a common theme from you assclowns when you

can't actually argue your point": a classic of its kind from you. To
see you complain about ad hominem remarks is excellent given that:

(i) as I have previously pointed to you, almost all of your posts are
wrong.

(ii) numerous people who know far more about it than you have
explained repeatedly where and why you are wrong - the problem is not
that your correspondents don't argue the point (they do) but that you
seem unable to learn anything at all; and

(iii) in consequence, your first reflex to any post in response to one
of yours which points out your errors is to issue an insult.

Bonita Montero

unread,

Jan 3, 2020, 1:29:08 PM1/3/20

to

Didn't you use optimizations?

Look at this code:

size_t const N = 200;

struct S
{
char c[200];
};

void f( S *a, S *b )
{
memcpy( a->c, b->c, N );
}

This is the MSVC-code:

movups xmm0, XMMWORD PTR [rdx]
movups XMMWORD PTR [rcx], xmm0
movups xmm1, XMMWORD PTR [rdx+16]
movups XMMWORD PTR [rcx+16], xmm1
movups xmm0, XMMWORD PTR [rdx+32]
movups XMMWORD PTR [rcx+32], xmm0
movups xmm1, XMMWORD PTR [rdx+48]
movups XMMWORD PTR [rcx+48], xmm1
movups xmm0, XMMWORD PTR [rdx+64]
movups XMMWORD PTR [rcx+64], xmm0
movups xmm1, XMMWORD PTR [rdx+80]
movups XMMWORD PTR [rcx+80], xmm1
movups xmm0, XMMWORD PTR [rdx+96]
movups XMMWORD PTR [rcx+96], xmm0
sub rcx, -128
movups xmm0, XMMWORD PTR [rdx+112]
movups XMMWORD PTR [rcx-16], xmm0
movups xmm1, XMMWORD PTR [rdx+128]
movups XMMWORD PTR [rcx], xmm1
movups xmm0, XMMWORD PTR [rdx+144]
movups XMMWORD PTR [rcx+16], xmm0
movups xmm1, XMMWORD PTR [rdx+160]
movups XMMWORD PTR [rcx+32], xmm1
movups xmm0, XMMWORD PTR [rdx+176]
movups XMMWORD PTR [rcx+48], xmm0
mov rax, QWORD PTR [rdx+192]
mov QWORD PTR [rcx+64], rax
ret 0

This is the gcc-code:

movq (%rsi), %rax
movq %rdi, %rcx
leaq 8(%rdi), %rdi
movq %rax, -8(%rdi)
movq 192(%rsi), %rax
movq %rax, 184(%rdi)
andq $-8, %rdi
subq %rdi, %rcx
subq %rcx, %rsi
addl $200, %ecx
shrl $3, %ecx
rep movsq
ret

Keith Thompson

unread,

Jan 3, 2020, 1:32:49 PM1/3/20

to

Öö Tiib <oot...@hot.ee> writes:
> On Friday, 3 January 2020 16:55:21 UTC+2, bol...@nowhere.org wrote:

[...]

>> You made the claim so either back it up or fuck off back under your bridge.
>
> Being vulgar makes you to look only more pathetic.

And, frankly, responding doesn't make you look good. As long as you
keep engaging with trolls, they have no incentive to stop trolling.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
[Note updated email address]
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Öö Tiib

unread,

Jan 3, 2020, 1:43:41 PM1/3/20

to

On Friday, 3 January 2020 20:32:49 UTC+2, Keith Thompson wrote:
> Öö Tiib <oot...@hot.ee> writes:
> > On Friday, 3 January 2020 16:55:21 UTC+2, bol...@nowhere.org wrote:
> [...]
> >> You made the claim so either back it up or fuck off back under your bridge.
> >
> > Being vulgar makes you to look only more pathetic.
>
> And, frankly, responding doesn't make you look good. As long as you
> keep engaging with trolls, they have no incentive to stop trolling.

Yeah, seems that I waste my own (and possibly other people's) time
since boltrol can not reason coherently.

Melzzzzz

unread,

Jan 3, 2020, 1:54:40 PM1/3/20

to

For small blocks movs instructions are better. For big ones
opposite.

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Melzzzzz

unread,

Jan 3, 2020, 1:55:35 PM1/3/20

to

Perhaps he is just dumb?

Bo Persson

unread,

Jan 3, 2020, 2:01:26 PM1/3/20

to

So you put a 150 byte string in a union? And then access it as some
other type? And there is no loop?!

You stated that using memcpy is bad because it *always* includes a
function call and a loop. I showed you one case where it doesn't. That
takes away the *always* from the argument.

Wanna guess how the compiler handles other small types, like int, long,
float, or double?

Bo Persson

Öö Tiib

unread,

Jan 3, 2020, 2:05:49 PM1/3/20

to

Here is union containing swizzles but there are also

array_type v;

It has common initial sequence since has same as swizzle::array_type.

struct // todo: alignment, padding?
{
value_type x;
value_type y;
value_type z;
};

Get rid of it, or replace with your swizzles.

> used by my swizzle class here:
>
> https://github.com/i42output/neoGFX/blob/master/include/neogfx/core/swizzle.hpp

Hmm. Lets see:

| A class S is a standard-layout class if it:
| —(7.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,

Check. Unless your swizzle::array_type is non-standard-layout class.

| —(7.2) has no virtual functions (13.3) and no virtual base classes (13.1),

Check.

| —(7.3) has the same access control (Clause 14) for all non-static data members,

Check. Public.

| —(7.4) has no non-standard-layout base classes,
| —(7.5) has at most one base class subobject of any given type,
| —(7.6) has all non-static data members and bit-ﬁelds in the class and its base classes ﬁrst declared in the same class, and
| —(7.7) has no element of the set M(S) of types (deﬁned below) as a base class.108 M(X) is deﬁned as follows: etc.

Check. No base classes.

Chris M. Thomasson

unread,

Jan 3, 2020, 5:34:47 PM1/3/20

to

On 1/3/2020 6:24 AM, David Brown wrote:
> On 03/01/2020 15:04, Mr Flibble wrote:
>
>>
>> I need this ability so I can have a swizzle class.
>>
>
> What exactly do you mean by "swizzle class", and what are you trying to do?

Are you familiar with the GLSL language? Think of something along the
lines of:
________________
vec2 foo = vec2(1., 2.);
vec3 bar = vec3(foo.xy, 3.);
vec4 foobar = vec4(foo.xy, bar.yz);

foobar = { 1., 2., 2., 3. }
________________

> I am entirely confident that you do /not/ need union type punning to
> achieve your goals. But it is possible that it would simpler, clearer
> and more efficient than standards-compliant solutions. (This is fine,
> of course, if you are happy to restrict your code to compilers for which
> the code is valid.) Without more details of what you are trying to do,
> I don't see how anyone can help - another thread trying to convince
> people to understand what unions can and cannot do in standard C++ is
> not particularly useful.
>

Afaict, its going to be rather "tedious" to get it done in C++. Humm...

Chris M. Thomasson

unread,

Jan 3, 2020, 5:38:08 PM1/3/20

to

These are using the same POD structs in my original example:
____________________________
struct base
{
int p0;
};

struct object_0
{
base b;
int p0;
};

struct object_1
{
base b;
int p0;
int p1;
};

union object
{
base b;
object_0 o0;
object_1 o1;
};
____________________________

Everything boils down to int types, so no special operators need to be
defined.

Chris M. Thomasson

unread,

Jan 3, 2020, 5:45:58 PM1/3/20

to

On 1/3/2020 7:04 AM, Bonita Montero wrote:
>>> You first. You demonstrated lack of knowledge with the "overhead of a
>
>> You made the claim so either back it up or fuck off back under your
>> bridge.
>
> I just did
>
> #include <cstring>
> #include <cstdint>
>
> uint64_t native( double d )
> {
> #if defined(UNION)
>     union
>     {
>         double   du;
>         uint64_t dx;
>     };
>     du = d;
>     return dx;

[...]

It seems like a static_assert to ensure that a double is the same size
as a unit64_t is in order? Or native just knows what its doing, and is
not meant to be portable?

Chris M. Thomasson

unread,

Jan 3, 2020, 5:46:49 PM1/3/20

to

On 1/3/2020 8:12 AM, bol...@nowhere.org wrote:
> On Fri, 3 Jan 2020 07:34:50 -0800 (PST)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> On Friday, 3 January 2020 16:55:21 UTC+2, bol...@nowhere.org wrote:
>>> You made the claim so either back it up or fuck off back under your bridge.
>>
>> Being vulgar makes you to look only more pathetic. You started by
>
> Ooh get you!
>
>> demonstrating lack of knowledge with the "overhead of a
>> function call + loop" first. So now please have balls to stand behind
>> your words of nonsense or to admit your miserable ignorance and
>> obtain that clue bat Mr Flibble keeps suggesting. Sad obscenities do
>> not help you. :D
>
> So you're saying using memcpy() never creates a function call or loop

> because its built in?[...]

Did he say that? Where?

Melzzzzz

unread,

Jan 3, 2020, 5:59:32 PM1/3/20

to

Or at all needed...

Mr Flibble

unread,

Jan 3, 2020, 7:11:25 PM1/3/20

to

On 03/01/2020 22:59, Melzzzzz wrote:
> On 2020-01-03, Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
>> On 1/3/2020 6:24 AM, David Brown wrote:
>>> On 03/01/2020 15:04, Mr Flibble wrote:
>>>
>>>>
>>>> I need this ability so I can have a swizzle class.
>>>>
>>>
>>> What exactly do you mean by "swizzle class", and what are you trying to do?
>>
>> Are you familiar with the GLSL language? Think of something along the
>> lines of:
>> ________________
>> vec2 foo = vec2(1., 2.);
>> vec3 bar = vec3(foo.xy, 3.);
>> vec4 foobar = vec4(foo.xy, bar.yz);
>>
>> foobar = { 1., 2., 2., 3. }
>> ________________
>>
>>
>>> I am entirely confident that you do /not/ need union type punning to
>>> achieve your goals. But it is possible that it would simpler, clearer
>>> and more efficient than standards-compliant solutions. (This is fine,
>>> of course, if you are happy to restrict your code to compilers for which
>>> the code is valid.) Without more details of what you are trying to do,
>>> I don't see how anyone can help - another thread trying to convince
>>> people to understand what unions can and cannot do in standard C++ is
>>> not particularly useful.
>>>
>>
>> Afaict, its going to be rather "tedious" to get it done in C++. Humm...
>
> Or at all needed...

It is needed to make client-site code as simple/clean/elegant as possible:

clean:

v.x = 42; f(v.x)

not-clean:

v.set_x(42); f(v.x());

Melzzzzz

unread,

Jan 3, 2020, 7:21:14 PM1/3/20

to

It's just syntax. Preference.
Object vs functional style.
We all know that is advisable not to touch member variables directly.
When that changed?

>
> /Flibble

Bonita Montero

unread,

Jan 3, 2020, 7:25:31 PM1/3/20

to

> For small blocks movs instructions are better.
> For big ones opposite.

The compiler knows better than stupid Melzzz.

Mr Flibble

unread,

Jan 3, 2020, 7:26:26 PM1/3/20

to

There isn't a class invariant so it is no different to using a "struct" (yes, I know there isn't much difference between "struct" and "class" in C++).

Bonita Montero

unread,

Jan 3, 2020, 7:26:40 PM1/3/20

to

>>> You made the claim so either back it up or fuck off back under your
>>> bridge.
>>
>> I just did
>>
>> #include <cstring>
>> #include <cstdint>
>>
>> uint64_t native( double d )
>> {
>> #if defined(UNION)
>>      union
>>      {
>>          double   du;
>>          uint64_t dx;
>>      };
>>      du = d;
>>      return dx;
> [...]

> It seems like a static_assert to ensure that a double is the same size

> as a unit64_t is in order? ...

LOL, in that _experimental_code_ just to prove
that memcpy() is translated into efficient moves?

Melzzzzz

unread,

Jan 3, 2020, 7:28:11 PM1/3/20

to

Nope. These two don's agree but:
~/examples/assembler >>> ./rdtscp 200 [32]
200 128 byte blocks, loops:20000
rep movsb 0.00000054797211
rep movsq 0.00000054844105
movntdq 0.00000065490963
movntdq prefetch 0.00000063764716
movntdq prefetch ymm 0.00000063049784
~/examples/assembler >>> ./rdtscp 4 [32]
4 128 byte blocks, loops:1000000
rep movsb 0.00000004582558
rep movsq 0.00000002845861
movntdq 0.00000008605540
movntdq prefetch 0.00000008531304
movntdq prefetch ymm 0.00000008636872

see, it's all empyrical...

Bonita Montero

unread,

Jan 3, 2020, 7:42:11 PM1/3/20

to

>>> For small blocks movs instructions are better.
>>> For big ones opposite.

>> The compiler knows better than stupid Melzzz.

> Nope. These two don's agree but:
> ~/examples/assembler >>> ./rdtscp 200 [32]
> 200 128 byte blocks, loops:20000
> rep movsb 0.00000054797211
> rep movsq 0.00000054844105
> movntdq 0.00000065490963
> movntdq prefetch 0.00000063764716
> movntdq prefetch ymm 0.00000063049784
> ~/examples/assembler >>> ./rdtscp 4 [32]
> 4 128 byte blocks, loops:1000000
> rep movsb 0.00000004582558
> rep movsq 0.00000002845861
> movntdq 0.00000008605540
> movntdq prefetch 0.00000008531304
> movntdq prefetch ymm 0.00000008636872
> see, it's all empyrical...

That's not as easy to test as you might think.

Melzzzzz

unread,

Jan 3, 2020, 7:52:24 PM1/3/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

It is not easy, that is why I wrote tests...

Chris M. Thomasson

unread,

Jan 3, 2020, 7:55:28 PM1/3/20

to

Color me pendant... ;^)

Melzzzzz

unread,

Jan 3, 2020, 8:02:25 PM1/3/20

to

Not once I saw examples when compiler (gcc) returned garbage
in different scenarios. That code screams 'volatile' ;)

Chris M. Thomasson

unread,

Jan 3, 2020, 8:06:45 PM1/3/20

to

On 1/3/2020 5:02 PM, Melzzzzz wrote:
> On 2020-01-04, Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
>> On 1/3/2020 4:26 PM, Bonita Montero wrote:
>>>>>> You made the claim so either back it up or fuck off back under your
>>>>>> bridge.
>>>>>
>>>>> I just did
>>>>>
>>>>> #include <cstring>
>>>>> #include <cstdint>
>>>>>
>>>>> uint64_t native( double d )
>>>>> {
>>>>> #if defined(UNION)
>>>>>      union
>>>>>      {
>>>>>          double   du;
>>>>>          uint64_t dx;
>>>>>      };
>>>>>      du = d;
>>>>>      return dx;
>>>> [...]
>>>
>>>> It seems like a static_assert to ensure that a double is the same size
>>>> as a unit64_t is in order? ...
>>>
>>> LOL, in that _experimental_code_ just to prove
>>> that memcpy() is translated into efficient moves?
>>
>> Color me pendant... ;^)
> Not once I saw examples when compiler (gcc) returned garbage
> in different scenarios. That code screams 'volatile' ;)
>

Actually, portability be damned, I used 64 bit atomics to work with
doubles that happened to be of the same size. It was a fast and dirty
way to work on doubles atomically. Well, I only used it for atomic swap,
compare-exchange, load and store. NO fetch-and-add. It would destroy things.

Chris M. Thomasson

unread,

Jan 3, 2020, 9:47:37 PM1/3/20

to

The cleaner the better, wrt trying to port GLSL code to its C++ "equal".
It can be done, and a nice std swizzle would be great. Now, going from
C++ to GLSL is easy, in a sense. The other way around, sometimes, get
ready for fun! ;^o

> clean:
>
> v.x = 42; f(v.x)
>
> not-clean:
>
> v.set_x(42); f(v.x());

I see, indeed.

Chris M. Thomasson

unread,

Jan 3, 2020, 10:07:51 PM1/3/20

to

[...]

I have to admit that this is sort of "selfish". Imvvho, I would not
object to C++ supporting some sort of GLSL like swizzle.

Öö Tiib

unread,

Jan 4, 2020, 1:37:46 AM1/4/20

to

Sorry, you are correct, I misread or misremembered. The problem it
supposedly solves is bit confusing for me. I am not saying we do
not need it just that I do not understand in what situation I
need it.

Bonita Montero

unread,

Jan 4, 2020, 2:08:10 AM1/4/20

to

>> That's not as easy to test as you might think.

> It is not easy, that is why I wrote tests...

You didn't test what I disassembled.

Bonita Montero

unread,

Jan 4, 2020, 2:34:22 AM1/4/20

to

>> It is not easy, that is why I wrote tests...

> You didn't test what I disassembled.

Here's a little test:

#include <iostream>
#include <cstring>
#include <cstdint>
#include <chrono>

using namespace std;
using namespace chrono;

size_t const SIZE = 200;

struct S
{
char c[200];
};

void fCopy( S *a, S *b )
{
memcpy( a->c, b->c, SIZE );
}

int main()
{
void (*volatile pfCopy)( S *, S * ) = fCopy;
S a, b;
time_point<high_resolution_clock> start = high_resolution_clock::now();
for( size_t n = 1'000'000; n; --n )
pfCopy( &a, &b );
uint64_t ns = (uint64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count();;
cout << (double)ns / 1.0E6 << "ms" << endl;
}

Under Windows the execution-time of the loop is 3.96ms.
Under Linux the execution time is 12.96ms.
SSE simply rules here.

Bonita Montero

unread,

Jan 4, 2020, 2:43:46 AM1/4/20

to

> ...

> Under Windows the execution-time of the loop is 3.96ms.
> Under Linux the execution time is 12.96ms.
> SSE simply rules here.

MSVC even uses AVX2 for memcpy when I enable it via compiler-switch.
That's what the fCopy-code looks like then:

vmovups xmm0, XMMWORD PTR [rdx]
vmovups XMMWORD PTR [rcx], xmm0
vmovups xmm1, XMMWORD PTR [rdx+16]
vmovups XMMWORD PTR [rcx+16], xmm1
vmovups xmm0, XMMWORD PTR [rdx+32]
vmovups XMMWORD PTR [rcx+32], xmm0
vmovups xmm1, XMMWORD PTR [rdx+48]
vmovups XMMWORD PTR [rcx+48], xmm1
vmovups xmm0, XMMWORD PTR [rdx+64]
vmovups XMMWORD PTR [rcx+64], xmm0
vmovups xmm1, XMMWORD PTR [rdx+80]
vmovups XMMWORD PTR [rcx+80], xmm1
vmovups xmm0, XMMWORD PTR [rdx+96]
vmovups XMMWORD PTR [rcx+96], xmm0
vmovups xmm0, XMMWORD PTR [rdx+112]
vmovups XMMWORD PTR [rcx+112], xmm0
vmovups xmm1, XMMWORD PTR [rdx+128]
sub rcx, -128
vmovups XMMWORD PTR [rcx], xmm1
vmovups xmm0, XMMWORD PTR [rdx+144]
vmovups XMMWORD PTR [rcx+16], xmm0
vmovups xmm1, XMMWORD PTR [rdx+160]
vmovups XMMWORD PTR [rcx+32], xmm1
vmovups xmm0, XMMWORD PTR [rdx+176]
vmovups XMMWORD PTR [rcx+48], xmm0

mov rax, QWORD PTR [rdx+192]
mov QWORD PTR [rcx+64], rax
ret 0

But the performance is only slightly better with AVX2 over SSE
(3.68ms).

Melzzzzz

unread,

Jan 4, 2020, 3:20:17 AM1/4/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

That's not a test of that code.

Melzzzzz

unread,

Jan 4, 2020, 3:25:09 AM1/4/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

It is test of rep movs vs non temporal moves with SSE/AVX2.
non temporal moves with SSE/AVX2 wuold be slower.

Bonita Montero

unread,

Jan 4, 2020, 3:25:22 AM1/4/20

to

>> #include <iostream>
>> #include <cstring>
>> #include <cstdint>
>> #include <chrono>
>>
>> using namespace std;
>> using namespace chrono;
>>
>> size_t const SIZE = 200;
>>
>> struct S
>> {
>> char c[200];
>> };
>>
>> void fCopy( S *a, S *b )
>> {
>> memcpy( a->c, b->c, SIZE );
>> }
>>
>> int main()
>> {
>> void (*volatile pfCopy)( S *, S * ) = fCopy;
>> S a, b;
>> time_point<high_resolution_clock> start = high_resolution_clock::now();
>> for( size_t n = 1'000'000; n; --n )
>> pfCopy( &a, &b );
>> uint64_t ns = (uint64_t)duration_cast<nanoseconds>(
>> high_resolution_clock::now() - start ).count();;
>> cout << (double)ns / 1.0E6 << "ms" << endl;
>> }
>> Under Windows the execution-time of the loop is 3.96ms.
>> Under Linux the execution time is 12.96ms.
>> SSE simply rules here.

> That's not a test of that code.

It is.

Bonita Montero

unread,

Jan 4, 2020, 3:25:53 AM1/4/20

to

>> You didn't test what I disassembled.

> It is test of rep movs vs non temporal moves with SSE/AVX2.
> non temporal moves with SSE/AVX2 wuold be slower.

Then you're wrong in this part of the thread.

Melzzzzz

unread,

Jan 4, 2020, 3:27:46 AM1/4/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

No I am not for block of that size gcc code is more efficient...

Bonita Montero

unread,

Jan 4, 2020, 3:36:42 AM1/4/20

to

>>> It is test of rep movs vs non temporal moves with SSE/AVX2.
>>> non temporal moves with SSE/AVX2 wuold be slower.

>> Then you're wrong in this part of the thread.

> No I am not for block of that size gcc code is more efficient...

You're at the wrong place in the subthread. What I said started here:
<qunp5o$17ci$1...@gioia.aioe.org> - and everything below that posting
should be related to what I said there. But you're simply a stupid
person which doesn't undestand this.

Melzzzzz

unread,

Jan 4, 2020, 3:41:36 AM1/4/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

No, you are buffon. I just commented on assembly output.
It would be interresting what code would gcc generate on
larger blocks...

Bonita Montero

unread,

Jan 4, 2020, 3:49:44 AM1/4/20

to

> No, you are buffon. I just commented on assembly output.
> It would be interresting what code would gcc generate on
> larger blocks...

But this isn't true:

Melzzzzz

unread,

Jan 4, 2020, 4:01:26 AM1/4/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

You will never learn anything. Because you don't listen more
experienced...

Bonita Montero

unread,

Jan 4, 2020, 4:02:40 AM1/4/20

to

>>> For small blocks movs instructions are better. For big ones
>>> opposite.

> You will never learn anything. Because you don't listen more
> experienced...

I'll listen to the experience of the compiler-writers.
They're have for sure more experience than you.

Melzzzzz

unread,

Jan 4, 2020, 4:19:57 AM1/4/20

to

On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:

You are dumb. You clearly see different code that does not holds your
presumption. If that was true generated assembly code for same code snippet
would be same and all compilers woudl be equally efficient. Not ralizing
taht fact makes you a dumb.

Bonita Montero

unread,

Jan 4, 2020, 4:39:02 AM1/4/20

to

Am 04.01.2020 um 10:19 schrieb Melzzzzz:
> On 2020-01-04, Bonita Montero <Bonita....@gmail.com> wrote:
>>>>> For small blocks movs instructions are better. For big ones
>>>>> opposite.
>>
>>> You will never learn anything. Because you don't listen more
>>> experienced...
>>
>> I'll listen to the experience of the compiler-writers.
>> They're have for sure more experience than you.
>
> You are dumb. You clearly see different code that does not holds your
> presumption. If that was true generated assembly code for same code snippet
> would be same and all compilers woudl be equally efficient. Not ralizing
> taht fact makes you a dumb.

Look at this:

#include <iostream>
#include <cstring>
#include <cstdint>
#include <chrono>

using namespace std;
using namespace chrono;

extern "C" void fAvx();
extern "C" void fMovs();

int main()
{
using timestamp = time_point<high_resolution_clock>;
timestamp start = high_resolution_clock::now();
fAvx();

uint64_t ns = (uint64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count();;
cout << (double)ns / 1.0E6 << "ms" << endl;

start = high_resolution_clock::now();
fMovs();

ns = (uint64_t)duration_cast<nanoseconds>( high_resolution_clock::now()
- start ).count();;
cout << (double)ns / 1.0E6 << "ms" << endl;
}

And the assembly code fopr fAvx and fMovs:

_TEXT SEGMENT

fAvx PROC
sub rsp, 64
mov rdx, 1000000000
avxLoop:
vmovups ymm0, [rsp]
vmovups [rsp + 32], ymm0
dec rdx
jnz avxLoop
add rsp, 64
ret
fAvx ENDP

fMovs PROC
push rsi
push rdi
sub rsp, 64
mov rdx, 1000000000
movsLoop:
mov rcx, 4
mov rsi, rsp
lea rdi, [rsp + 32]
rep movsq
dec rdx
jnz movsLoop
add rsp, 64
pop rdi
pop rsi
ret
fMovs ENDP

_TEXT ENDS
END

I repeatedly copy 32 bytes - _a_small_amount_of_memory_ and according
what you say copying with movs should be faster here. But on my computer
fAvx is 6,6 times faster. Any questions?

David Brown

unread,

Jan 4, 2020, 6:45:12 AM1/4/20

to

On 04/01/2020 01:25, Bonita Montero wrote:
>> For small blocks movs instructions are better.
>> For big ones opposite.
>

> The compiler knows better than stupid Melzzz.
>

I suspect the results will be highly dependent on details, like the
exact chip you are using, and where you draw the line between "small
blocks" and "big blocks".

Compiler writers put a lot of effort into fine-tuning this kind of thing
for different situations. But the compiler can only optimise on the
basis of the information you give it. If you want optimal code for a
particular device, you have to tell the compiler exactly which device
you want to target (and what devices you want to support, which is not
necessarily the same thing). Then it can make a stab at picking the
right kind of instructions here - whether it is SIMD, loops, rep
instructions, AVX, or whatever.

Different compilers can make different assumptions about the target
device, when this information is missing from the command lines.

So in any argument about which instruction sequence is "best", it is
entirely possible for both sides to be right and very likely that the
argument is pointless.

Melzzzzz

unread,

Jan 4, 2020, 7:00:15 AM1/4/20

to

In this particular case gcc is right and VC is wrong.
~/examples/assembler >>> ./rdtscp 1000000
1000000 128 byte blocks, loops:4
rep movsb 0.01132903184211
rep movsq 0.01134612947368
movntdq 0.00317828842105
movntdq prefetch 0.00316638236842
movntdq prefetch ymm 0.00316397368421
~/examples/assembler >>> ./rdtscp 1 [32]
1 128 byte blocks, loops:4000000
rep movsb 0.00000002601557
rep movsq 0.00000001063161
movntdq 0.00000008210285
movntdq prefetch 0.00000008319713
movntdq prefetch ymm 0.00000008284743

simply put for large blocks SSE2/AVX2 is better for
small movs on both Intel and AMD.
On Intel pre Zen2 AVX2 was better as it has 256 bit
data move while AMD prior to Zen2 has 128 bit.

Bonita Montero

unread,

Jan 4, 2020, 7:03:44 AM1/4/20

to

> simply put for large blocks SSE2/AVX2 is better for
> small movs on both Intel and AMD.

No, movsd/q is always inferiror.

Bonita Montero

unread,

Jan 4, 2020, 8:22:26 AM1/4/20

to

> I suspect the results will be highly dependent on details, like the
> exact chip you are using, and where you draw the line between "small
> blocks" and "big blocks".

Here's a little benchmark that compares rep movsq with avx-copying
(without loop-unrolling!):

C++-Code:

#include <Windows.h>

#include <iostream>
#include <cstring>
#include <cstdint>
#include <chrono>

#include <intrin.h>

using namespace std;
using namespace chrono;

extern "C" void fAvx( __m256 *src, __m256 *dst, size_t size, size_t
repts );
extern "C" void fMovs( __m256 *src, __m256 *dst, size_t size, size_t
repts );

int main()
{
size_t const PAGE = 4096,
ROUNDS = 100'000;
char *pPage = (char *)VirtualAlloc( nullptr, 2 * PAGE,
MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE );
__m256 *src = (__m256 *)pPage,
*dst = (__m256 *)(pPage + PAGE);
memset( pPage, 0, 2 * PAGE );

using timestamp = time_point<high_resolution_clock>;

for( size_t size = 1; size <= (PAGE / 32); ++size )
{

timestamp start = high_resolution_clock::now();

fAvx( src, dst, size, ROUNDS );
uint64_t avxNs = (uint64_t)duration_cast<nanoseconds>(

high_resolution_clock::now() - start ).count();;

start = high_resolution_clock::now();
fMovs( src, dst, size, ROUNDS );
uint64_t movsNs = (uint64_t)duration_cast<nanoseconds>(

high_resolution_clock::now() - start ).count();;

cout << "size: " << size << "\tavx:\t" << avxNs / 1.0E6 <<
"\tmovs\t" << movsNs / 1.0E6 << endl;
}
}

Asm-Code:

_TEXT SEGMENT

; void fAvx( __m256 *src, __m256 *dst, size_t count, size_t repts );
; rcx: src
; rdx: dst
; r8: count
; r9: repts
fAvx PROC
test r9, r9
jz zero
test r8, r8
jz zero
mov rax, r8
shl rax, 5
add rax, rcx
sub rdx, rcx
mov r10, rcx
mov r11, rdx
jmp avxLoop
reptLoop:
mov rcx, r10
mov rdx, r11
avxLoop:
vmovups ymm0, [rcx]
vmovups [rcx+rdx], ymm0
add rcx, 32
cmp rcx, rax
jne avxLoop
dec r9
jnz reptLoop
zero:
ret
fAvx ENDP

; void fMovs( __m256 *src, __m256 *dst, size_t count, size_t repts );
; rcx: src
; rdx: dst
; r8: count
; r9: repts
fMovs PROC
test r9, r9
jz zero
push rsi
push rdi
mov r10, rcx
mov r11, rdx
lea rdx, [r8 * 4]
reptLoop:
mov rsi, r10
mov rdi, r11
mov rcx, rdx
rep movsq
dec r9
jnz reptLoop
pop rdi
pop rsi
zero:

ret
fMovs ENDP

_TEXT ENDS
END

That's the relative speedup of AVX over rep movsq:

size: 1 1383,79%
size: 2 737,12%
size: 3 433,35%
size: 4 342,41%
size: 5 283,20%
size: 6 431,57%
size: 7 351,47%
size: 8 340,53%
size: 9 314,24%
size: 10 325,57%
size: 11 270,96%
size: 12 327,83%
size: 13 296,13%
size: 14 275,73%
size: 15 284,19%
size: 16 317,27%
size: 17 331,54%
size: 18 266,05%
size: 19 287,00%
size: 20 281,83%
size: 21 276,17%
size: 22 261,85%
size: 23 263,01%
size: 24 251,48%
size: 25 247,98%
size: 26 237,64%
size: 27 239,66%
size: 28 187,04%
size: 29 185,92%
size: 30 189,09%
size: 31 168,90%
size: 32 179,31%
size: 33 220,31%
size: 34 192,71%
size: 35 207,33%
size: 36 214,69%
size: 37 156,90%
size: 38 169,47%
size: 39 184,87%
size: 40 159,98%
size: 41 175,79%
size: 42 156,60%
size: 43 162,29%
size: 44 155,36%
size: 45 158,09%
size: 46 164,42%
size: 47 154,88%
size: 48 164,17%
size: 49 155,84%
size: 50 157,59%
size: 51 148,29%
size: 52 152,67%
size: 53 139,59%
size: 54 149,78%
size: 55 140,99%
size: 56 146,94%
size: 57 142,01%
size: 58 148,15%
size: 59 141,62%
size: 60 152,89%
size: 61 152,00%
size: 62 149,20%
size: 63 150,13%
size: 64 150,45%
size: 65 140,96%
size: 66 132,11%
size: 67 142,80%
size: 68 135,96%
size: 69 146,18%
size: 70 140,17%
size: 71 139,63%
size: 72 139,22%
size: 73 131,02%
size: 74 145,43%
size: 75 138,23%
size: 76 132,02%
size: 77 142,05%
size: 78 135,97%
size: 79 136,52%
size: 80 138,93%
size: 81 136,06%
size: 82 138,59%
size: 83 139,08%
size: 84 134,50%
size: 85 136,64%
size: 86 134,28%
size: 87 133,35%
size: 88 129,82%
size: 89 138,07%
size: 90 132,57%
size: 91 125,16%
size: 92 138,73%
size: 93 135,70%
size: 94 131,55%
size: 95 126,62%
size: 96 134,87%
size: 97 130,83%
size: 98 129,21%
size: 99 126,70%
size: 100 133,07%
size: 101 129,39%
size: 102 129,12%
size: 103 125,27%
size: 104 124,14%
size: 105 131,78%
size: 106 132,87%
size: 107 131,40%
size: 108 128,29%
size: 109 122,95%
size: 110 121,13%
size: 111 121,73%
size: 112 126,26%
size: 113 130,87%
size: 114 131,31%
size: 115 124,70%
size: 116 119,53%
size: 117 121,42%
size: 118 120,34%
size: 119 125,65%
size: 120 124,95%
size: 121 130,36%
size: 122 128,35%
size: 123 128,25%
size: 124 127,47%
size: 125 124,28%
size: 126 124,14%
size: 127 122,69%
size: 128 122,76%

So movsq is never faster.
Here's the result graphically: https://app.unsee.cc/#45f34f42
So its also exact the opposite as Melzzz said: movsq becomes
more competitive as the block-size raises.

Chris M. Thomasson

unread,

Jan 4, 2020, 7:51:45 PM1/4/20

to

On 1/3/2020 4:26 PM, Mr Flibble wrote:
> On 04/01/2020 00:21, Melzzzzz wrote:

>>> clean:
>>>
>>> v.x = 42; f(v.x)
>>>
>>> not-clean:
>>>
>>> v.set_x(42); f(v.x());
>>

>> It's just syntax. Preference.
>> Object vs functional style.
>> We all know that is advisable not to touch member variables directly.
>> When that changed?
>
> There isn't a class invariant so it is no different to using a "struct"
> (yes, I know there isn't much difference between "struct" and "class" in
> C++).

Side note, I actually saw a person claim that there was no difference,
then complained that everything is private.

Daniel

unread,

Jan 8, 2020, 12:41:11 PM1/8/20

to

On Thursday, January 2, 2020 at 1:26:46 PM UTC-5, Öö Tiib wrote:

> Experiments of mine and others have shown that usage of memcpy
> instead of other ways of type punning tends to generate better or
> same codes. It is hard to find examples where other methods are
> really beneficial.

Does that observation apply to, eg. bit casting

uint8_t v[8000000] to float[1000000]

(which is a typical case in deserialization of typed arrays when endiness lines up.)

Thanks,
Daniel

Öö Tiib

unread,

Jan 8, 2020, 4:24:03 PM1/8/20

to

Note that bytes are special since memory consists of those. It is
not type punning to convert between POD object and its bytes.
Otherwise memcpy itself would be wrong. With "type punning" things
like converting between IEEE double and uint64_t are usually meant.

However when we deserialize then we typically get those bytes from
somewhere as smaller packets. For example Ethernet packet is
1500 bytes + 26 bytes of header and there are usually other
garbage by protocol so for floats there is less than 1500 bytes.
Then we memcpy from there into our floats. I can't
even imagine how you suggest to use union there instead. Can you?

Daniel

unread,

Jan 9, 2020, 1:41:17 AM1/9/20

to

On Wednesday, January 8, 2020 at 4:24:03 PM UTC-5, Öö Tiib wrote:
>
> However when we deserialize then we typically get those bytes from
> somewhere as smaller packets. For example Ethernet packet is
> 1500 bytes + 26 bytes of header and there are usually other
> garbage by protocol so for floats there is less than 1500 bytes.
> Then we memcpy from there into our floats. I can't
> even imagine how you suggest to use union there instead. Can you?

No :-)

Daniel

Daniel

unread,

Jan 9, 2020, 1:52:31 AM1/9/20

to

On Wednesday, January 8, 2020 at 4:24:03 PM UTC-5, Öö Tiib wrote:
>
> However when we deserialize then we typically get those bytes from
> somewhere as smaller packets. For example Ethernet packet is
> 1500 bytes + 26 bytes of header and there are usually other
> garbage by protocol so for floats there is less than 1500 bytes.
> Then we memcpy from there into our floats.

But if the last item in the packet is half a float, do you still memcpy
directly into floats?

Daniel

Öö Tiib

unread,

Jan 9, 2020, 3:59:40 PM1/9/20

to

What you mean by "half a float"? Half-received?
If half-received then may be yes, may be no. Hard to
tell without context.

For example if program needs to process received floats even
without all received then I probably would not copy half-floats.
But if it may refuse to work until all floats are received then
I probably would copy half but would throw after timeout
destroying all the received floats and not processing anyway.

Daniel

unread,

Jan 9, 2020, 4:32:24 PM1/9/20

to

On Thursday, January 9, 2020 at 3:59:40 PM UTC-5, Öö Tiib wrote:
>
> What you mean by "half a float"? Half-received?

Half-received

> If half-received then may be yes, may be no. Hard to
> tell without context.
>
> For example if program needs to process received floats even
> without all received then I probably would not copy half-floats.
> But if it may refuse to work until all floats are received then
> I probably would copy half but would throw after timeout
> destroying all the received floats and not processing anyway.

Fair enough.

Daniel

Mist...@honorific.org

unread,

Jan 10, 2020, 7:28:46 AM1/10/20

to

On Thu, 9 Jan 2020 12:59:29 -0800 (PST)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Thursday, 9 January 2020 08:52:31 UTC+2, Daniel wrote:

>> On Wednesday, January 8, 2020 at 4:24:03 PM UTC-5, =C3=96=C3=B6 Tiib wrot=
>e:
>> >=20

>> > However when we deserialize then we typically get those bytes from
>> > somewhere as smaller packets. For example Ethernet packet is
>> > 1500 bytes + 26 bytes of header and there are usually other
>> > garbage by protocol so for floats there is less than 1500 bytes.

>> > Then we memcpy from there into our floats.=20
>>=20

>> But if the last item in the packet is half a float, do you still memcpy

>> directly into floats?=20
>
>What you mean by "half a float"? Half-received?=20

>If half-received then may be yes, may be no. Hard to
>tell without context.
>
>For example if program needs to process received floats even
>without all received then I probably would not copy half-floats.
>But if it may refuse to work until all floats are received then
>I probably would copy half but would throw after timeout
>destroying all the received floats and not processing anyway.

These sort of problems are why any genuine network programmer always includes
the expected packet size in a stream packet (usually at the beginning) instead
of doing something stupid like sending raw XML or json on its own where
constant parsing is required to see if the end of the data has been reached yet
and may get it wrong if the formatting is bad and continue waiting for data
beyond the end. For UDP where data send in a single write is returned in a
single read that approach is fine, not so for TCP. Sadly a new generation of
programmers has yet to figure this out.

David Brown

unread,

Jan 10, 2020, 8:02:50 AM1/10/20

to

On 10/01/2020 13:28, Mist...@honorific.org wrote:
> On Thu, 9 Jan 2020 12:59:29 -0800 (PST)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> On Thursday, 9 January 2020 08:52:31 UTC+2, Daniel wrote:
>>> On Wednesday, January 8, 2020 at 4:24:03 PM UTC-5, =C3=96=C3=B6 Tiib wrot=
>> e:
>>>> =20
>>>> However when we deserialize then we typically get those bytes from
>>>> somewhere as smaller packets. For example Ethernet packet is
>>>> 1500 bytes + 26 bytes of header and there are usually other
>>>> garbage by protocol so for floats there is less than 1500 bytes.
>>>> Then we memcpy from there into our floats.=20
>>> =20
>>> But if the last item in the packet is half a float, do you still memcpy
>>> directly into floats?=20
>>
>> What you mean by "half a float"? Half-received?=20
>> If half-received then may be yes, may be no. Hard to
>> tell without context.
>>
>> For example if program needs to process received floats even
>> without all received then I probably would not copy half-floats.
>> But if it may refuse to work until all floats are received then
>> I probably would copy half but would throw after timeout
>> destroying all the received floats and not processing anyway.
>
> These sort of problems are why any genuine network programmer always includes

This is a matter of protocol design, not programming. Any "genuine
network programmer" writes code to handle the protocol specified -
whether it includes a packet size or not.

Protocols where the length of the telegram is given either explicitly in
a header, or implied from the protocol type, make reception simpler.
But they can involve more work when generating the data. There is a
trade-off here, and it is not as simple as saying one method is always
best. (Having said that, I prefer to give the length explicitly when
designing protocols.)

> the expected packet size in a stream packet (usually at the beginning) instead
> of doing something stupid like sending raw XML or json on its own where
> constant parsing is required to see if the end of the data has been reached yet
> and may get it wrong if the formatting is bad and continue waiting for data
> beyond the end. For UDP where data send in a single write is returned in a
> single read that approach is fine, not so for TCP. Sadly a new generation of
> programmers has yet to figure this out.
>

Some protocols on top of TCP/IP or UDP, such as MQTT, have length fields
in the telegrams.

For JSON, a very simple solution is to use newline-delimited JSON - you
send the JSON string, then a linefeed character. It's easy to see when
you have the whole string, because you have found a character 10, and
can then start parsing.

I think it would be nice to see more use of SCTP, which combines the
benefits of TCP/IP and UDP - letting you get the "datagram" feature of
UDP while also getting the guarantees and fragmentation control of
TCP/IP. But it is not well supported in common routers (or Windows),
making it a lot harder to establish it for common use.

Mist...@honorific.org

unread,

Jan 10, 2020, 11:21:26 AM1/10/20

to

Well yes, but I was talking about when designing one.

>> and may get it wrong if the formatting is bad and continue waiting for data
>> beyond the end. For UDP where data send in a single write is returned in a
>> single read that approach is fine, not so for TCP. Sadly a new generation of
>
>> programmers has yet to figure this out.
>>
>
>Some protocols on top of TCP/IP or UDP, such as MQTT, have length fields
>in the telegrams.
>
>For JSON, a very simple solution is to use newline-delimited JSON - you
>send the JSON string, then a linefeed character. It's easy to see when
>you have the whole string, because you have found a character 10, and
>can then start parsing.

That'll work fine - until you have ascii 10 inside a quoted string which is
entirely plausible. The same problem exists if doing simple curly bracket
counting. Its a non trivial problem and shouldn't be something the network
layer has to worry about.

>I think it would be nice to see more use of SCTP, which combines the

Unfortunately SCTP is de facto dead. It was a good attempt but didn't get the
traction.

David Brown

unread,

Jan 10, 2020, 1:18:43 PM1/10/20

to

No, it is not plausible - you can't have ASCII 10 in the JSON string.
It must be escaped as "\n".

>
>> I think it would be nice to see more use of SCTP, which combines the
>
> Unfortunately SCTP is de facto dead. It was a good attempt but didn't get the
> traction.
>

Exactly.

Daniel

unread,

Jan 10, 2020, 6:41:32 PM1/10/20

to

On Friday, January 10, 2020 at 1:18:43 PM UTC-5, David Brown wrote:

> No, it is not plausible - you can't have ASCII 10 in the JSON string.
> It must be escaped as "\n".
>

ASCII 10 in JSON quoted string values must be escaped as "\n", but
the JSON text may otherwise contain unescaped white space characters, including
ASCII 10.

Daniel

Jorgen Grahn

unread,

Jan 11, 2020, 5:48:43 AM1/11/20

to

On Fri, 2020-01-10, Mist...@honorific.org wrote:
> On Fri, 10 Jan 2020 14:02:38 +0100
> David Brown <david...@hesbynett.no> wrote:

...

>>I think it would be nice to see more use of SCTP, which combines the
>
> Unfortunately SCTP is de facto dead. It was a good attempt but didn't get the
> traction.

I believe SCTP has its entrenched niche uses, in at least telecom,
replacing older horrors.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

David Brown

unread,

Jan 11, 2020, 6:01:35 AM1/11/20

to

I checked again, and you are entirely correct - any of the four standard
white characters are allowed (and ignored) outside of strings in JSON.

However, in newline-delimited JSON, which I was talking about, you can't
have ASCII 10 or ASCII 13 in the encoding. (Use \n and \r in strings.)
This is precisely so that it is suitable for streamed packets, and it
lets you find the end each message simply by scanning for ASCII 10.
(Strings are in UTF-8, and byte 10 would not be a valid part of a UTF-8
string except as a LF character.)

So I would have been correct if I had stuck to referring to
line-delimited or newline-delimited JSON. But I was wrong when I
referred to general JSON. Sorry for the mix-up, and thank you for the
correction.

Jorgen Grahn

unread,

Jan 11, 2020, 6:05:59 AM1/11/20

to

On Fri, 2020-01-10, Mist...@honorific.org wrote:

> On Thu, 9 Jan 2020 12:59:29 -0800 (PST)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:

...

>>For example if program needs to process received floats even
>>without all received then I probably would not copy half-floats.
>>But if it may refuse to work until all floats are received then
>>I probably would copy half but would throw after timeout
>>destroying all the received floats and not processing anyway.
>
> These sort of problems are why any genuine network programmer always includes
> the expected packet size in a stream packet (usually at the beginning) instead
> of doing something stupid like sending raw XML or json on its own where
> constant parsing is required to see if the end of the data has been reached yet

I kind of disagree. Having an end marker is a perfectly good way to
split a stream into messages. SMTP is one of many traditional
protocols which does it this way.

I agree it's not good to have a weakly defined end-of-message, or
one that's expensive to find.

Sending the message size before the message typically means you have
to generate the message and buffer it locally before you can begin to
transmit it -- which is inefficient for large messages, or messages
which are expensive to generate.

> and may get it wrong if the formatting is bad and continue waiting for data
> beyond the end.

That's IMO part of normal robustness, which any "genuine" network
programmer needs to implement. (Part of it involves giving up after a
timeout: if a message has been partly received and nothing more
happens after a few seconds, it's probably never going to happen.)

Mist...@honorific.org

unread,

Jan 11, 2020, 6:53:50 AM1/11/20

to

On Fri, 10 Jan 2020 19:18:31 +0100

David Brown <david...@hesbynett.no> wrote:
>On 10/01/2020 17:21, Mist...@honorific.org wrote:
>> On Fri, 10 Jan 2020 14:02:38 +0100

>> That'll work fine - until you have ascii 10 inside a quoted string which is
>> entirely plausible. The same problem exists if doing simple curly bracket
>> counting. Its a non trivial problem and shouldn't be something the network
>> layer has to worry about.
>
>No, it is not plausible - you can't have ASCII 10 in the JSON string.
>It must be escaped as "\n".

I'll take your word for that, but I've definately seen json strings with raw
newlines in. Obviously not following the json spec.

Mist...@honorific.org

unread,

Jan 11, 2020, 6:55:45 AM1/11/20

to

On 11 Jan 2020 11:05:49 GMT

Jorgen Grahn <grahn...@snipabacken.se> wrote:
>On Fri, 2020-01-10, Mist...@honorific.org wrote:
>> On Thu, 9 Jan 2020 12:59:29 -0800 (PST)

>> and may get it wrong if the formatting is bad and continue waiting for data
>> beyond the end.
>
>That's IMO part of normal robustness, which any "genuine" network
>programmer needs to implement. (Part of it involves giving up after a
>timeout: if a message has been partly received and nothing more
>happens after a few seconds, it's probably never going to happen.)

Partial reception is the least of your worries - reading part of the next
message by accident is far more of an issue. All this goes away with a length
field which is why all sane network and application layer protocols use one.

David Brown

unread,

Jan 11, 2020, 7:21:13 AM1/11/20

to

See Daniel's post, and my reply to that. ASCII 10 (and 13) are allowed
in JSON as white space, which is ignore outside strings. But
specifically for line-delimited or newline-delimited JSON, where you
have a newline character after the JSON string, you are /not/ allowed
ASCII 10 or ASCII 13 inside the JSON string object.

Mist...@honorific.org

unread,

Jan 11, 2020, 7:33:20 AM1/11/20

to

On Sat, 11 Jan 2020 13:20:56 +0100

A rule which will almost certainly be forgotten about or not tested for by
someone at some point. For network messages the data should never be its own
delimiter. IMO anyway.

David Brown

unread,

Jan 11, 2020, 8:28:56 AM1/11/20

to

In line-delimited JSON, the delimiter is not part of the data - so it
fulfils your requirement.

Most JSON libraries generate strings without any extra white space
unless you specifically ask for it (which can be useful for human
readability).

And for any communication with any protocol, you have to be sure you
implement the protocol correctly and test appropriately. I can't see
why line-delimited JSON would be any different.

Daniel

unread,

Jan 11, 2020, 10:05:38 AM1/11/20

to

On Saturday, January 11, 2020 at 6:01:35 AM UTC-5, David Brown wrote:
>
> So I would have been correct if I had stuck to referring to
> line-delimited or newline-delimited JSON.

But what would you have been referring to? This one?

https://github.com/ndjson/ndjson-spec

For interoperability it's preferred to refer to an Internet Standards
Document, but it doesn't seem an RFC exists yet for this one.

Daniel

Tim Rentsch

unread,

Jan 11, 2020, 10:10:43 AM1/11/20

to

Tiib writes:

> On Friday, 3 January 2020 01:39:54 UTC+2, Chris M. Thomasson wrote:
>
>> On 1/1/2020 11:24 AM, Mr Flibble wrote:
>>
>>> Undefined behaviour aside is C++ union type punning an acceptable hack
>>> or not, given that it is legal in C?
>>> If it works on all the compilers I care about does it really matter that
>>> it is *currently* classed as undefined behaviour?
>>> I would say that everyone has "secretly" fucking done it some time or
>>> other so the C++ Standards Committee should get there act together and
>>> legalize it like it has been in C.
>>> Can anyone provide a straight answer to this?
>>
>> Is there some cryptic hidden thing in the standard that just might
>> "allow" for the following code to run without ub?
>
> Possibly.
>
>> ___________________________
>> #include <iostream>
>>
>> struct base
>> {
>> int p0;
>> };
>
> Standard layout classes have lot of additional guarantees.
>
>> struct object_0
>> {
>> base b;
>> int p0;
>> };
>
> Standard layout with common initial sequence with base.

Unless I'm missing something the two struct types base and object_0
don't have a common initial sequence (or we might say the common
initial sequence is the empty set). The first member of base is of
type int. The first member of object_0 is of type base. These two
types, int and base, are not layout-compatible: they are not the
same type; they are not layout-compatible enumerations; and they
are not layout-compatible standard-layout class types. Hence there
is no common initial sequence.