memcpy/memset/etc overloads for volatile memory

dgutson .

unread,

May 18, 2018, 4:41:08 PM5/18/18

to std-proposals

This might be interesting.

In some systems, the implementation of memcpy() & friends cannot be used for memory-mapped devices.

Usually, mapped memory is marked as volatile.

I'm proposing to add volatile pointers overloads for the memcpy() function and friends, without specifying exactly what should be different at the standard-level, but allowing implementations to provide a particular behavior. I'm just proposing to add the declaration to the standard.

Particularly, in some x86-based systems, reading things like the SPI should be done on byte-per-byte basis (rather than streamed with any SIMD instruction).

I'm not sure what the wording should exactly say, but I'm just initiating the discussion.

Daniel.

--

Who’s got the sweetest disposition?
One guess, that’s who?
Who’d never, ever start an argument?
Who never shows a bit of temperament?
Who's never wrong but always right?
Who'd never dream of starting a fight?
Who get stuck with all the bad luck?

Andrey Semashev

unread,

May 18, 2018, 4:46:37 PM5/18/18

to std-pr...@isocpp.org

On 05/18/18 23:41, dgutson . wrote:
> This might be interesting.
>
> In some systems, the implementation of memcpy() & friends cannot be used
> for memory-mapped devices.
> Usually, mapped memory is marked as volatile.
>
> I'm proposing to add volatile pointers overloads for the memcpy()
> function and friends, without specifying exactly what should be
> different at the standard-level, but allowing implementations to provide
> a particular behavior. I'm just proposing to add the declaration to the
> standard.
>
> Particularly, in some x86-based systems, reading things like the SPI
> should be done on byte-per-byte basis (rather than streamed with any
> SIMD instruction).
>
> I'm not sure what the wording should exactly say, but I'm just
> initiating the discussion.

If it's not specified then the exact behavior wrt. memory accesses is
not going to be portable, which means it's not going to be useful.

dgutson .

unread,

May 18, 2018, 4:53:16 PM5/18/18

to std-proposals

It is useful indeed: if I have a pointer to volatile memory, I cannot use memcpy currently.

By adding the overload, I'm being guaranteed that I will be able to call memcpy (and that the implementation shall DTRT, which is for example, the byte per byte copying).

Besides the "DTRT" discussion, my point is: I want this overload to be available. At least it should fall back to the non-volatile version.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/0e5ce74c-3e9e-4cf2-4afd-f7333ed82c5d%40gmail.com.

Andrey Semashev

unread,

May 18, 2018, 5:13:18 PM5/18/18

to std-pr...@isocpp.org

What is "the right thing"? Does it have to be a byte-granular operation?
Does it have to be a forward iteration? Is this behavior the same on all
implementations? You have to answer on all these questions in the
standard wording so that everyone can rely on the exact memory access
pattern, which is important in case of volatile memory.

If it is not important in your case then you don't need volatile and
hence the overload.

Personally, I don't see much use in such an overload. Where memory
access pattern is important I would rather spell it explicitly or at
least use std::copy, which, unlike memcpy, does define the order of
iteration and granularity.

Thiago Macieira

unread,

May 18, 2018, 8:16:44 PM5/18/18

to std-pr...@isocpp.org

On Friday, 18 May 2018 14:13:15 PDT Andrey Semashev wrote:
> What is "the right thing"? Does it have to be a byte-granular operation?
> Does it have to be a forward iteration? Is this behavior the same on all
> implementations? You have to answer on all these questions in the
> standard wording so that everyone can rely on the exact memory access
> pattern, which is important in case of volatile memory.
>
> If it is not important in your case then you don't need volatile and
> hence the overload.

On the other hand, if your memory storage requires a specific behaviour which
cannot be standardised, maybe you should implement the copying directly in
your code. It's not like bytewise memcpy and memset are particularly difficult
to implement...

It's the SIMD-optimised with cacheline-alignment and such that are.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

dgutson .

unread,

May 18, 2018, 9:24:53 PM5/18/18

to std-proposals

El vie., 18 de mayo de 2018 21:16, Thiago Macieira <thi...@macieira.org> escribió:

On Friday, 18 May 2018 14:13:15 PDT Andrey Semashev wrote:
> What is "the right thing"? Does it have to be a byte-granular operation?
> Does it have to be a forward iteration? Is this behavior the same on all
> implementations? You have to answer on all these questions in the
> standard wording so that everyone can rely on the exact memory access
> pattern, which is important in case of volatile memory.
>
> If it is not important in your case then you don't need volatile and
> hence the overload.

On the other hand, if your memory storage requires a specific behaviour which
cannot be standardised, maybe you should implement the copying directly in
your code. It's not like bytewise memcpy and memset are particularly difficult
to implement...

Oh yes you don't know how hard it can get... :) avoid prefetching, avoid SSE/vectorization, ensure right order of operations (WC regions).

So there are lots of details that a too smart toolchain may mess.

However I will think if an execution policy can help here in copy. I'm not talking about copying but also memsetting.

The linux kernel has a nice memcpy_toio/fromio worth to look at.

It's the SIMD-optimised with cacheline-alignment and such that are.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.

To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2300193.1RphyUdPvq%40tjmaciei-mobl1.

Thiago Macieira

unread,

May 19, 2018, 2:20:32 AM5/19/18

to std-pr...@isocpp.org

On Friday, 18 May 2018 18:24:37 PDT dgutson . wrote:
> Oh yes you don't know how hard it can get... avoid prefetching, avoid

> SSE/vectorization, ensure right order of operations (WC regions).
> So there are lots of details that a too smart toolchain may mess.

Even with volatile? It can't change the accesses you make if you use volatile.
So no prefetching, no SSE. The order is exactly the one that you specify.

> However I will think if an execution policy can help here in copy. I'm not
> talking about copying but also memsetting.
> The linux kernel has a nice memcpy_toio/fromio worth to look at.

Does it work for other types of memory besides I/O/

Jens Maurer

unread,

May 23, 2018, 4:05:36 AM5/23/18

to std-pr...@isocpp.org

On 05/19/2018 03:24 AM, dgutson . wrote:

>
>
> El vie., 18 de mayo de 2018 21:16, Thiago Macieira <thi...@macieira.org <mailto:thi...@macieira.org>> escribió:
>
> On Friday, 18 May 2018 14:13:15 PDT Andrey Semashev wrote:
> > What is "the right thing"? Does it have to be a byte-granular operation?
> > Does it have to be a forward iteration? Is this behavior the same on all
> > implementations? You have to answer on all these questions in the
> > standard wording so that everyone can rely on the exact memory access
> > pattern, which is important in case of volatile memory.
> >
> > If it is not important in your case then you don't need volatile and
> > hence the overload.
>
> On the other hand, if your memory storage requires a specific behaviour which
> cannot be standardised, maybe you should implement the copying directly in
> your code. It's not like bytewise memcpy and memset are particularly difficult
> to implement...
>
>
> Oh yes you don't know how hard it can get... :) avoid prefetching, avoid SSE/vectorization, ensure right order of operations (WC regions).
> So there are lots of details that a too smart toolchain may mess.
> However I will think if an execution policy can help here in copy. I'm not talking about copying but also memsetting.
> The linux kernel has a nice memcpy_toio/fromio worth to look at.

"volatile" only conveys a single bit of information, but
modern architectures need a lot more data to actually do
what you want if the "memory" isn't your usual kind of
RAM with a cache hierarchy on top. For example, it might
make a difference whether you copy bytewise or wordwise (and
both options might be useful in different circumstances).
For example, you might need to expressly synchronize writes
with later reads.

Since you mention the Linux kernel, note that it has a lot
more than just a volatile memcpy to deal with all these
issues. Maybe there's a general enough abstraction hidden
in there somewhere that we could eventually standardize, but
just having a volatile memcpy is neither here nor there.

Jens

dgutson .

unread,

May 23, 2018, 9:31:18 AM5/23/18

to std-proposals

OK, agree.

To provide some context to other readers: (despite these are kernel-space issues, but it should apply)

https://lwn.net/Articles/698014/

https://wiki.gentoo.org/wiki/MTRR_and_PAT

I think that we could think about some requirements packed in something like a memory_traits, where it would specify:

- pointer type (optionally volatile-qualified)

- alignment requirements

- memory synchronization needs / specifications (an enumerator)

- vectorization habilities

The challenge is to turn this implementation agnostic enogh so implementations can fill in the values.

Then we could provide memory primitives overloads (memcpy, memset, or std::copy, etc.) receiving the memory_traits type as template argument.

This is related to transactional memory, std::memory_order, and other fences and memory barriers. So maybe the traits would pack this information plus other bits?

Jens

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.

To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5B0520CD.30204%40gmx.net.

Thiago Macieira

unread,

May 23, 2018, 11:13:26 AM5/23/18

to std-pr...@isocpp.org

On Wednesday, 23 May 2018 10:31:15 -03 dgutson . wrote:
> I think that we could think about some requirements packed in something
> like a memory_traits, where it would specify:
> - pointer type (optionally volatile-qualified)
> - alignment requirements
> - memory synchronization needs / specifications (an enumerator)
> - vectorization habilities

Or, like I said, you can roll out your own memcpy or memset (which are quite
trivial) if you have specific requirements.

dgutson .

unread,

May 23, 2018, 11:33:05 AM5/23/18

to std-proposals

On Wed, May 23, 2018 at 12:13 PM, Thiago Macieira <thi...@macieira.org> wrote:

On Wednesday, 23 May 2018 10:31:15 -03 dgutson . wrote:
> I think that we could think about some requirements packed in something
> like a memory_traits, where it would specify:
> - pointer type (optionally volatile-qualified)
> - alignment requirements
> - memory synchronization needs / specifications (an enumerator)
> - vectorization habilities

Or, like I said, you can roll out your own memcpy or memset (which are quite
trivial) if you have specific requirements.

I propose to leave "triviality" out of the discussion, since this can get combinatoric complex, and there is no metric of "triviality" either.

There are many std functions that are arguably trivial as well, and despite that, they are in the STL because people use them.

We can keep arguing about triviality, what is trivial and what isn't, and we will end up comparing somebody with Hitler as per Godwin's Law.

Please forgive me, I sincerely respect you but this is not a discussion I would like to have.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1695197.Kxq4M8jdJr%40tjmaciei-mobl1.

dgutson .

unread,

May 23, 2018, 11:34:52 AM5/23/18

to std-proposals

On Wed, May 23, 2018 at 12:33 PM, dgutson . <daniel...@gmail.com> wrote:

On Wed, May 23, 2018 at 12:13 PM, Thiago Macieira <thi...@macieira.org> wrote:
On Wednesday, 23 May 2018 10:31:15 -03 dgutson . wrote:
> I think that we could think about some requirements packed in something
> like a memory_traits, where it would specify:
> - pointer type (optionally volatile-qualified)
> - alignment requirements
> - memory synchronization needs / specifications (an enumerator)
> - vectorization habilities

Or, like I said, you can roll out your own memcpy or memset (which are quite
trivial) if you have specific requirements.

I propose to leave "triviality" out of the discussion, since this can get combinatoric complex, and there is no metric of "triviality" either.
There are many std functions that are arguably trivial as well, and despite that, they are in the STL because people use them.
We can keep arguing about triviality, what is trivial and what isn't, and we will end up comparing somebody with Hitler as per Godwin's Law.
Please forgive me, I sincerely respect you but this is not a discussion I would like to have.

(especially because you may end up also, as I mentioned at the beginning, trying to control the optimizer preventing it to be too smart for "too trivial loops").

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1695197.Kxq4M8jdJr%40tjmaciei-mobl1.

--
Who’s got the sweetest disposition?
One guess, that’s who?
Who’d never, ever start an argument?
Who never shows a bit of temperament?
Who's never wrong but always right?
Who'd never dream of starting a fight?
Who get stuck with all the bad luck?

Thiago Macieira

unread,

May 23, 2018, 12:05:12 PM5/23/18

to std-pr...@isocpp.org

On Wednesday, 23 May 2018 12:33:02 -03 dgutson . wrote:
> > Or, like I said, you can roll out your own memcpy or memset (which are
> > quite
> > trivial) if you have specific requirements.
>
> I propose to leave "triviality" out of the discussion, since this can get
> combinatoric complex, and there is no metric of "triviality" either.

Sorry, but my point is that to write:

for (size_t i = 0; i < n; ++i)
dst[i] = src[i];

is quite trivial and does implement the most basic memcpy for byte-wise,
volatile data. This is what such a generic, volatile memcpy would do and I see
no use in providing it in the standard library since it can't differ from
that. There is no room for optimisation to be applied, unlike the normal
memcpy.

When you start adding requirements, this gets out of hand. The permutations
will explode:
- memory fence or no?
- how often should the memory fence be done? Is it per time, per address or
per cumulative bytes copied?
- which memory fence (load, store or full)?
- which pointer controls the fence, source or destination?
- can it read more than two bytes at a time? If so, must reads be aligned?
- can it read four bytes at a time? If so, must 4-byte reads be aligned?
- can it read eight... sixteen, thirty-two, sixty four?
- can misaligned reads cross a cacheline? can they cross a page boundary?
- repeat the read questions by replacing with write
- can it read more than one element (whichever size) before writing it (i.e.,
can it pipeline)?
- should the stores use non-temporal cache hints? Which cache level?

My point is that the moment you need anything from that list above, you should
roll out your own code, for your own purposes. The operating system kernels
can have shared functions for this because they know which hardware they're
going to run on and thus what types of volatile memory behaviour they're going
to face. That is not the case for the C++ standard library.

However, if you want to provide this volatile memcpy as a library, you're
welcome to. You can limit your needs to modern architectures or just one (your
employer's), which should cut down the permutations from millions of
possibilities to just tens. The problem becomes manageable.

Thiago Macieira

unread,

May 23, 2018, 12:08:02 PM5/23/18

to std-pr...@isocpp.org

On Wednesday, 23 May 2018 13:05:07 -03 Thiago Macieira wrote:
> Sorry, but my point is that to write:
>
> for (size_t i = 0; i < n; ++i)
> dst[i] = src[i];

By the way, this should be what

std::copy_n(src, n, dst);

Should generate for volatile chars, but the compilation fails with both libc++
and libstdc++, as both try too aggressively to use memmove. Since I don't see
any requirement in the standard besides src and dst being valid InputIterator
and OutputIterator respectively, I believe that's a bug in both
implementations.

Microsoft's compiles. See https://godbolt.org/g/7yMhRy.

Niall Douglas

unread,

May 27, 2018, 7:09:29 PM5/27/18

to ISO C++ Standard - Future Proposals

On Friday, May 18, 2018 at 9:41:08 PM UTC+1, dgutson wrote:

This might be interesting.

In some systems, the implementation of memcpy() & friends cannot be used for memory-mapped devices.
Usually, mapped memory is marked as volatile.

I'm proposing to add volatile pointers overloads for the memcpy() function and friends, without specifying exactly what should be different at the standard-level, but allowing implementations to provide a particular behavior. I'm just proposing to add the declaration to the standard.

Just there this weekend I, yet again, had to write a custom memcpy() implementation because the built-in one can be elided by the compiler during optimisation. I'd like to stop having to write memcpy() personally.

I think your idea has two variants:

Cannot be optimised out memcpy/memset/memmove etc. I'd suggest secure_memset(), secure_memcpy(), secure_memmove() and so on rather than editions taking a volatile pointer.
Said routines, but with additional std::memory_order parameter.

With respect to your ideas about i/o, you can't do that yet with the current C++ memory model. There is no concept of making writes visible to main memory, or to i/o devices, in some sequential order in the current C++ standard. The current C++ memory model only speaks of visibility to threads of execution. It doesn't handle visibility to remote hardware, like network cards, hard drives, graphics cards etc.

SG1 Concurrency are aware of this limitation, and may in the future decide to address it. I would be sure that well written papers are welcome on that topic.

I'd certainly like to see papers land before WG21 proposing secure_*() functions, perhaps with overloads with std::memory_order parameter. The latter should be sent to SG1 I would suspect, the former could just go to LEWG.

Niall

j c

unread,

May 27, 2018, 8:02:39 PM5/27/18

to std-pr...@isocpp.org

On Monday, May 28, 2018, Niall Douglas <nialldo...@gmail.com> wrote:

On Friday, May 18, 2018 at 9:41:08 PM UTC+1, dgutson wrote:
This might be interesting.

In some systems, the implementation of memcpy() & friends cannot be used for memory-mapped devices.
Usually, mapped memory is marked as volatile.

I'm proposing to add volatile pointers overloads for the memcpy() function and friends, without specifying exactly what should be different at the standard-level, but allowing implementations to provide a particular behavior. I'm just proposing to add the declaration to the standard.

Just there this weekend I, yet again, had to write a custom memcpy() implementation because the built-in one can be elided by the compiler during optimisation. I'd like to stop having to write memcpy() personally.

I think your idea has two variants:
Cannot be optimised out memcpy/memset/memmove etc. I'd suggest secure_memset(), secure_memcpy(), secure_memmove() and so on rather than editions taking a volatile pointer.
Said routines, but with additional std::memory_order parameter

Or just stop compilers from doing daft things

dgutson .

unread,

May 27, 2018, 8:07:22 PM5/27/18

to std-proposals

FWIW, from my previous team: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69976

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.

To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAFQaeCD83_FBX4cAE%2B9AAEHai4ciCeOaorHY1K94k_Yb0e9wnA%40mail.gmail.com.

Jens Maurer

unread,

May 28, 2018, 8:51:54 AM5/28/18

to std-pr...@isocpp.org

On 05/28/2018 01:09 AM, Niall Douglas wrote:
> Just there this weekend I, yet again, had to write a custom memcpy()
> implementation because the built-in one can be elided by the compiler
> during optimisation. I'd like to stop having to write memcpy()
> personally.
>
> I think your idea has two variants:
>

> 1. Cannot be optimised out memcpy/memset/memmove etc. I'd suggest

> secure_memset(), secure_memcpy(), secure_memmove() and so on rather
> than editions taking a volatile pointer.

ISO C11 has memset_s in K.3.7.4.1.

Jens

Andrey Semashev

unread,

May 28, 2018, 11:47:14 AM5/28/18

to std-pr...@isocpp.org

Annex K is basically dead and is considered for for removal, as I heard.

There are a few non-portable replacements, though.

http://netbsd.gw.com/cgi-bin/man-cgi?explicit_memset+3+NetBSD-current
https://manpages.debian.org/testing/manpages-dev/bzero.3.en.html

Having a portable equivalent of explicit_memset & co. in the standard
(but not Annex K) would be really useful.

Reply all

Reply to author

Forward