Allow .data() to be called on reserved, but empty, vectors

stevem...@gmail.com

unread,

Jul 21, 2017, 10:40:25 PM7/21/17

to ISO C++ Standard - Future Proposals

Sometimes when using vectors in low-level embedded land it is nice to know where the underlying buffer of a vector is before any data exists in the vector. As it stands it is not possible to do this, even if first calling reserve.

Look at the following code:

std::vector<uint8_t> vec;

vec.reserve(32);

uint64_t bufStart = ...

Assume now that i want to use the vector to hold assembly instructions. Various instructions' encoding are relative to their location in memory, so in order to encode them properly one must know where they are about to live inside the vector. A naive user might try one of the following:

0) bufStart = &vec[0]

1) bufStart = &vec.at(0)

2) bufStart = &vec.front()

3) bufStart = vec.data()

All of these are U.B.

0) Will compile and may even work properly on some compilers, but operator[] is undefined when the vector is empty.

1) Will probably compiler will assert on runtime about the .size() of vector being to small

2) Will compile but is U.B.

3) Will compiler and even worse is sometimes optimized to return 0 by gcc. Also U.B

Number 3 is the one i propose to change. Currently as it is the standard is worded as such:

Changed to

23.3.6.4 [vector.data]
|
| T* data() noexcept;
| const T* data() const noexcept;
|
| Returns: A pointer such that [data(),data() + size()) is a valid
| range. For a vector of capacity() > 0, data() must point to the underlying buffer.

This will allow people to grab a pointer to the underlying buffer of a vector if and only if they have reserved space first. I suspect many people incorrectly expect one of the above methods I have listed to work as intended (though they do not).

Nicol Bolas

unread,

Jul 21, 2017, 11:32:47 PM7/21/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Friday, July 21, 2017 at 10:40:25 PM UTC-4, stevem...@gmail.com wrote:

Sometimes when using vectors in low-level embedded land it is nice to know where the underlying buffer of a vector is before any data exists in the vector. As it stands it is not possible to do this, even if first calling reserve.

....

This will allow people to grab a pointer to the underlying buffer of a vector if and only if they have reserved space first. I suspect many people incorrectly expect one of the above methods I have listed to work as intended (though they do not).

To what end?

OK, let's say that we allow you to get a pointer to that buffer. What are you going to do with it? Nothing that's legal in C++, that's for sure.

After all, the standard specifically states that the return from `data` is:

> A pointer such that [data(), data() + size()) is a valid range.

Since `size` is zero, that will be an empty range.

But let's pretend that we allowed you to access up to `capacity` rather than `size`. What good is that? First, there are no `T`'s past the end of the sequence. So pointer arithmetic in that region is dubious, and you can't just write to objects that don't exist.

But equally importantly, there's this:

vector<int> v;
v.reserve(10);
memcpy(v.data(), ..., 10 * sizeof(int));
v.resize(10);

OK, pop-quiz: what values does `v` contain? Here's a hint: it's not what was copied into the reserved space.

`vector::resize`, will copy/move a value if you provide it. And if you don't, then it will value-initialize the inserted elements. And therefore, `v.resize(10)` will overwrite everything the `memcpy` wrote.

`vector` has no way to resize itself without overwriting what was in the reserved space.

What you really want is a way to default-initialize the members of a `vector`, then fill them in at your leisure:

vector<int> v(10, std::default_init);
memcpy(v.data(), ..., 10 * sizeof(int));

It'd be much better to introduce a way to default-initialize members of containers. And objects of all types.

stevem...@gmail.com

unread,

Jul 21, 2017, 11:45:20 PM7/21/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

I really do want to get a pointer to that buffer. And for good reason. If i can get a pointer to that buffer then i can use std::uninitiallized_copy to fill it in at my leisure. There really are cases where I need to know the memory address that my buffer lives at before I put anything into it. Adding a default-initialization to all standard types would be cool, but I hardly think that's the easiest solution to this issue. There's also cases where I don't want to pay for this default initialization, if you give me a way to fill that buffer myself and then tell vector it's change in size() then there's no reason to pay for default initialization.

Nevin Liber

unread,

Jul 21, 2017, 11:46:10 PM7/21/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 10:32 PM, Nicol Bolas <jmck...@gmail.com> wrote:

On Friday, July 21, 2017 at 10:40:25 PM UTC-4, stevem...@gmail.com wrote:
Sometimes when using vectors in low-level embedded land it is nice to know where the underlying buffer of a vector is before any data exists in the vector. As it stands it is not possible to do this, even if first calling reserve.

....

This will allow people to grab a pointer to the underlying buffer of a vector if and only if they have reserved space first. I suspect many people incorrectly expect one of the above methods I have listed to work as intended (though they do not).

To what end?

Um, you deleted his use case:

Assume now that i want to use the vector to hold assembly instructions. Various instructions' encoding are relative to their location in memory, so in order to encode them properly one must know where they are about to live inside the vector.

OK, let's say that we allow you to get a pointer to that buffer. What are you going to do with it? Nothing that's legal in C++, that's for sure.

He wants to calculate future addresses, such as a branch forward. That isn't unreasonable. Of course, I have no idea how he plans on executing such code.

However, there is a fairly trivial workaround; namely, make the first element of the vector a dummy element. Given that we want to discourage people from filling in an empty vector by means other than push_back/emplace_back/insert/emplace, it doesn't seem likely such a proposal would get very far.

--

Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> +1-847-691-1404

stevem...@gmail.com

unread,

Jul 21, 2017, 11:50:33 PM7/21/17

to ISO C++ Standard - Future Proposals

This is what i currently do but it's stupid. Vector's reserve already does a 'real' allocation, there should be some way for me to query where this buffer is. I just need an address so i know how to encode things, if you return an int64_t that's cool too. But really there should be a way to know where the buffer lives.

Nevin Liber

unread,

Jul 21, 2017, 11:51:32 PM7/21/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 10:45 PM, <stevem...@gmail.com> wrote:

I really do want to get a pointer to that buffer. And for good reason. If i can get a pointer to that buffer then i can use std::uninitiallized_copy to fill it in at my leisure. There really are cases where I need to know the memory address that my buffer lives at before I put anything into it. Adding a default-initialization to all standard types would be cool, but I hardly think that's the easiest solution to this issue. There's also cases where I don't want to pay for this default initialization, if you give me a way to fill that buffer myself and then tell vector it's change in size() then there's no reason to pay for default initialization.

Oh, you wish to break vector's invariants. I am now strongly against this.

stevem...@gmail.com

unread,

Jul 21, 2017, 11:53:55 PM7/21/17

to ISO C++ Standard - Future Proposals

This is all well and good from a puritan standpoint. But at the end of the day people use the STL for real world things and from embedded land I can tell you that I need to know where my buffer is.

Nicol Bolas

unread,

Jul 21, 2017, 11:57:29 PM7/21/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Friday, July 21, 2017 at 11:53:55 PM UTC-4, stevem...@gmail.com wrote:

This is all well and good from a puritan standpoint. But at the end of the day people use the STL for real world things and from embedded land I can tell you that I need to know where my buffer is.

But we can get the exact same thing without breaking those invariants. So why do it the anti-puritan way when you can get it done and still be puritan?

Does it really matter that much that it be done exactly the way you want it done? What matters is that it has the desired performance.

stevem...@gmail.com

unread,

Jul 22, 2017, 12:00:19 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

If adding default initialization to all the types actually happens then i'm totally for it. I'm weighing the probability of this actually happening. It seems like doing something like that would come around in C++ 30, while this could be done tomorrow.

Nevin Liber

unread,

Jul 22, 2017, 12:02:29 AM7/22/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 10:53 PM, <stevem...@gmail.com> wrote:

This is all well and good from a puritan standpoint.

It has nothing to do with a puritan standpoint.

If you, for instance, copy the vector, copy will not do the right thing. Heck, every querying function in vector (size(), empty(), at(), etc.) except for data() would do the wrong thing. What a horrible library design that would be.

But at the end of the day people use the STL for real world things and from embedded land I can tell you that I need to know where my buffer is.

It isn't hard to write your own class which holds a unique_ptr and a length. No need to abuse vector for this.

Nicol Bolas

unread,

Jul 22, 2017, 12:05:26 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Saturday, July 22, 2017 at 12:00:19 AM UTC-4, stevem...@gmail.com wrote:

If adding default initialization to all the types actually happens then i'm totally for it. I'm weighing the probability of this actually happening. It seems like doing something like that would come around in C++ 30, while this could be done tomorrow.

I really hate when people do that. When people say, "Oh, I know we ought to fix the problem directly, but I believe that the standards committee would never go for that, so instead let's do a pre-compromised half-measure that only works for a select few things."

Start with what you actually want, then compromise down if it becomes necessary. Assuming that something will fail before trying it doesn't help anyone.

It's also not actually true. The standards committee doesn't usually take a long time to decide if they want a feature or not. If they don't see it as acceptable at all, that tends to be decided rather quickly.

Also, please don't top-post.

stevem...@gmail.com

unread,

Jul 22, 2017, 12:05:58 AM7/22/17

to ISO C++ Standard - Future Proposals

I'm more than aware this is currently not safe that is WHY i made this thread, so that this vectors can morph into a thing that can finally replace C style arrays. As it is I am forced to use C style arrays, memcpy into them, and then construct a vector from that. STL vectors already do 99% of anything a C style array can do, if we add just a tiny bit more then C style arrays will finally have no real use case. I don't care AT ALL how that happens, just let me memcpy into a vector by calling reserve and i'll be a happy man.

Nicol Bolas

unread,

Jul 22, 2017, 12:08:06 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Saturday, July 22, 2017 at 12:05:58 AM UTC-4, stevem...@gmail.com wrote:

I'm more than aware this is currently not safe that is WHY i made this thread, so that this vectors can morph into a thing that can finally replace C style arrays. As it is I am forced to use C style arrays, memcpy into them, and then construct a vector from that. STL vectors already do 99% of anything a C style array can do, if we add just a tiny bit more then C style arrays will finally have no real use case. I don't care AT ALL how that happens, just let me memcpy into a vector by calling reserve and i'll be a happy man.

But that's not what you want. What you want is to create a vector without initializing its contents (for contents which can go uninitialized).

stevem...@gmail.com

unread,

Jul 22, 2017, 12:11:02 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

In an ideal world yes, filling the vector directly would be the fastest. As you say that is not nice since the vector can't know if someone changed it buffer out from under it. I can live with a default initialization syntax, as long as once i do this default initialization that i can fill the vector via memcpy safely.

Nevin Liber

unread,

Jul 22, 2017, 12:15:24 AM7/22/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 11:05 PM, <stevem...@gmail.com> wrote:

I'm more than aware this is currently not safe that is WHY i made this thread, so that this vectors can morph into a thing that can finally replace C style arrays. As it is I am forced to use C style arrays, memcpy into them, and then construct a vector from that.

You haven't addressed my strong objection in that your use case breaks almost the entire interface of vector. If you can't call size(), can't copy it correctly, etc., why do you need vector at all? Please explain why it is difficult to write a class that just manages one heap allocation and a length?

Thiago Macieira

unread,

Jul 22, 2017, 12:17:36 AM7/22/17

to std-pr...@isocpp.org

On Friday, 21 July 2017 19:40:25 PDT stevem...@gmail.com wrote:
> Sometimes when using vectors in low-level embedded land it is nice to know
> where the underlying buffer of a vector is before any data exists in the
> vector. As it stands it is not possible to do this, even if first calling
> reserve.

Call resize(), not reserve(). The reserve call may be ignored, if for any
reason the container does not wish to comply. So the moment you add your
instructions, the container could relocate.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Thiago Macieira

unread,

Jul 22, 2017, 12:18:33 AM7/22/17

to std-pr...@isocpp.org

On Friday, 21 July 2017 21:05:26 PDT Nicol Bolas wrote:
> I *really* hate when people do that. When people say, "Oh, I know we *ought*

> to fix the problem directly, but I believe that the standards committee
> would never go for that, so instead let's do a pre-compromised half-measure
> that only works for a select few things."

"Can you fix the horn in my car? The brakes aren't working"

Nevin Liber

unread,

Jul 22, 2017, 12:20:57 AM7/22/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 11:14 PM, Nevin Liber <ne...@eviloverlord.com> wrote:

Please explain why it is difficult to write a class that just manages one heap allocation and a length?

I'm sorry; that is over-engineering. You were not proposing that vector keep track of the length.

Simpler question: why isn't unique_ptr<T[]> the solution to your problem?

Nevin Liber

unread,

Jul 22, 2017, 12:27:23 AM7/22/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 11:17 PM, Thiago Macieira <thi...@macieira.org> wrote:

Call resize(), not reserve(). The reserve call may be ignored, if for any
reason the container does not wish to comply.

That is incorrect. reserve() may not be ignored. The parameter passed to reserve is the minimum amount of space allocated to the vector (and in all known implementations, if an allocation happens, the exact amount is reserved, but the committee does not wish to standardize that) or an exception is thrown. See [vector.capacity] for more details.

stevem...@gmail.com

unread,

Jul 22, 2017, 12:27:50 AM7/22/17

to ISO C++ Standard - Future Proposals

Re-implementing my custom vector class that lets me memcpy into it is totally a possibility, but i'd be re-implementing exactly what the STL does + making .data() return a valid pointer, no reason to do that and it'd be a horrible waste of time. I could totally use a C style array wrapped in a unique_ptr but what happens when i want to add more instructions to my array, oh i have to resize, copy, do all that logic. It's the same question "why did C++ invent the vector", C style arrays do the exact same thing, but vectors do it in a nicer wrapper.

I want to use a vector like a buffer for embedded applications. I am telling you what i face everyday. C style fits, C++ does not. I like C++ alot, lets fix it.

stevem...@gmail.com

unread,

Jul 22, 2017, 12:31:59 AM7/22/17

to ISO C++ Standard - Future Proposals

The vector only reallocates if the capacity is < current size(). That's the whole point of me calling reserve first. Inserting, push_back, all safe as long as the capacity is there

stevem...@gmail.com

unread,

Jul 22, 2017, 12:32:34 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

*capacity < needed size

Nevin Liber

unread,

Jul 22, 2017, 12:38:09 AM7/22/17

to std-pr...@isocpp.org

On Fri, Jul 21, 2017 at 11:27 PM, <stevem...@gmail.com> wrote:

Re-implementing my custom vector class that lets me memcpy into it is totally a possibility, but i'd be re-implementing exactly what the STL does + making .data() return a valid pointer, no reason to do that and it'd be a horrible waste of time. I could totally use a C style array wrapped in a unique_ptr but what happens when i want to add more instructions to my array, oh i have to resize, copy, do all that logic.

I don't see how that just works by having data() always return a pointer to the internal buffer. If you resize the vector, your original data will not be copied.

Please post code where resize and copy work correctly with a vector whose only change is that data() always returns a pointer to the internal buffer. Also show us code for making your motivating case of doing codegen work with resize and copy.

Nicol Bolas

unread,

Jul 22, 2017, 12:39:31 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Saturday, July 22, 2017 at 12:27:50 AM UTC-4, stevem...@gmail.com wrote:

On Saturday, July 22, 2017 at 12:20:57 AM UTC-4, Nevin ":-)" Liber wrote:
On Fri, Jul 21, 2017 at 11:14 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
Please explain why it is difficult to write a class that just manages one heap allocation and a length?

I'm sorry; that is over-engineering. You were not proposing that vector keep track of the length.

Simpler question: why isn't unique_ptr<T[]> the solution to your problem?
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> +1-847-691-1404

Re-implementing my custom vector class that lets me memcpy into it is totally a possibility, but i'd be re-implementing exactly what the STL does + making .data() return a valid pointer, no reason to do that and it'd be a horrible waste of time.

The part you're not understanding is that your request is not just "making .data() return a valid pointer". There's more to `vector`'s interface that you aren't aware of, and your implementation would either not conform to the rest of `vector` or invoke UB.

Thiago Macieira

unread,

Jul 22, 2017, 2:20:26 AM7/22/17

to std-pr...@isocpp.org

On Friday, 21 July 2017 21:26:40 PDT Nevin Liber wrote:
> On Fri, Jul 21, 2017 at 11:17 PM, Thiago Macieira <thi...@macieira.org>
>
> wrote:
> > Call resize(), not reserve(). The reserve call may be ignored, if for any
> > reason the container does not wish to comply.
>
> That is incorrect. reserve() may not be ignored. The parameter passed to
> reserve is the minimum amount of space allocated to the vector (and in all
> known implementations, if an allocation happens, the exact amount is
> reserved, but the committee does not wish to standardize that) or an
> exception is thrown. See [vector.capacity] for more details.

That's true for std::vector and the Standard Library containers, but it is not
a general rule for all containers. Reserving space is a hint that you're going
to need at least that amount. The keyword is "hint".

Besides, is there any wording that resize() with the same size on a space-
reserved container must NOT relocate? What happens if you shrunk the
container, is it allowed to relocate?

Thiago Macieira

unread,

Jul 22, 2017, 2:20:53 AM7/22/17

to std-pr...@isocpp.org

On Friday, 21 July 2017 21:31:59 PDT stevem...@gmail.com wrote:
> The vector only reallocates if the capacity is < current size(). That's the
> whole point of me calling reserve first. Inserting, push_back, all safe as
> long as the capacity is there

So why don't you call resize() instead of reserve() ?

Nevin Liber

unread,

Jul 22, 2017, 3:07:37 AM7/22/17

to std-pr...@isocpp.org

On Sat, Jul 22, 2017 at 1:20 AM, Thiago Macieira <thi...@macieira.org> wrote:

That's true for std::vector and the Standard Library containers, but it is not
a general rule for all containers.

There are no general rules for all containers that might exist in the world in the past, present or future. Seems a bit irrelevant, though.

Reserving space is a hint that you're going
to need at least that amount. The keyword is "hint".

If that is how you choose to design your own private containers, who am I to argue? But I don't see how it is relevant to how std::vector works.

Besides, is there any wording that resize() with the same size on a space-
reserved container must NOT relocate?

For vector (I've not looked at basic_string), resize(n) cannot relocate when n <= capacity().

What happens if you shrunk the
container, is it allowed to relocate?

shrink_to_fit() can relocate, but that call is just a non-binding hint.

--

Vishal Oza

unread,

Jul 22, 2017, 6:32:09 AM7/22/17

to ISO C++ Standard - Future Proposals

There seems like a lot of discussion about std::vector and using it as a raw buffer for types but could we introduce a new class that could work with raw memory rather then change std::vector to become like a raw buffer.

inkwizyt...@gmail.com

unread,

Jul 22, 2017, 6:50:50 AM7/22/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Saturday, July 22, 2017 at 5:32:47 AM UTC+2, Nicol Bolas wrote:

On Friday, July 21, 2017 at 10:40:25 PM UTC-4, stevem...@gmail.com wrote:

Sometimes when using vectors in low-level embedded land it is nice to know where the underlying buffer of a vector is before any data exists in the vector. As it stands it is not possible to do this, even if first calling reserve.

....

This will allow people to grab a pointer to the underlying buffer of a vector if and only if they have reserved space first. I suspect many people incorrectly expect one of the above methods I have listed to work as intended (though they do not).

To what end?

OK, let's say that we allow you to get a pointer to that buffer. What are you going to do with it? Nothing that's legal in C++, that's for sure.

After all, the standard specifically states that the return from `data` is:

> A pointer such that [data(), data() + size()) is a valid range.

Since `size` is zero, that will be an empty range.

But let's pretend that we allowed you to access up to `capacity` rather than `size`. What good is that? First, there are no `T`'s past the end of the sequence. So pointer arithmetic in that region is dubious, and you can't just write to objects that don't exist.

But `size` can change over time. I think basic idea that `data` return valid pointer is useful. Of corse not for manually filing vector but for sharing same memory with different components.
Image something like this:

int id = 13;
std::vector<std::byte> v;
v.reserve(1024);
message_buffer(id, v.data());

//some time after
v.clear();
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(0); //or `memcpy` of some struct to `v.data()` with correct size
message_notyfy(id);

With this you will have stable memory region that can be shared with C code or simply in high performance code you could skip one indirection.

Victor Dyachenko

unread,

Jul 24, 2017, 4:53:07 AM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

May be it's time to reanimate std::dynarray proposal? We can't both have hi-level invariants for std::vector and low-level full access to internal implementation of it.

Victor Dyachenko

unread,

Jul 24, 2017, 4:56:31 AM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Monday, July 24, 2017 at 11:53:07 AM UTC+3, Victor Dyachenko wrote:

May be it's time to reanimate std::dynarray proposal?

without "magic" on-stack allocation which killed the OP

Nicol Bolas

unread,

Jul 24, 2017, 12:58:41 PM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

But without that, `dynarray` is nothing more than a `vector` that you can't resize. And thus, the only substantive difference will be that `sizeof(dynarray<T>)` will be smaller than `sizeof(vector<T>)`. I don't think that's a good enough reason to standardize such a type.

That's not to say that I like the "magic on-stack allocation" of the original. But you have to admit, without that, there just isn't much need for the type. And it still wouldn't be able to do what the OP of this thread wants.

Arthur O'Dwyer

unread,

Jul 24, 2017, 6:49:48 PM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Friday, July 21, 2017 at 9:05:58 PM UTC-7, stevem...@gmail.com wrote:

I'm more than aware this is currently not safe that is WHY i made this thread, so that this vectors can morph into a thing that can finally replace C style arrays. As it is I am forced to use C style arrays, memcpy into them, and then construct a vector from that. STL vectors already do 99% of anything a C style array can do, if we add just a tiny bit more then C style arrays will finally have no real use case. I don't care AT ALL how that happens, just let me memcpy into a vector by calling reserve and i'll be a happy man.

Your thread started off great — let's have v.data() do the intuitive thing for reserved vectors! — but then devolved into this kind of thing, which is problematic.

What you seem to actually want is .resize():

std::vector<char> v;

v.resize(10);

memcpy(v.data(), "some data", 10); // excellent, works fine

Notice that calling .resize() on your average vector<char> will zero-initialize the contents. If you want default-initialization, you'll have to use a default-constructing allocator type, such as this.

However, I do think that what you originally asked for is useful in obscure cases, and doesn't cost us anything to standardize:

std::vector<char> v;

v.reserve(10);

char *p = v.data(); // currently UB; you proposed making this OK

v.resize(5);

assert(p == v.data()); // assert that the vector should not have reallocated

I have an example implementation here. (The example implementation is not surprising and matches all existing implementations I'm aware of; what you originally asked for was basically just a change in the wording to match what vendors already do.)

HTH,

Arthur

stevem...@gmail.com

unread,

Jul 24, 2017, 8:23:55 PM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

Yes i'll admit i conflated a few of my wishes for vectors, looking back it was clearly not wise. I do agree though very much though that .data() should be re-worded. I know i was very surprised to learn it was U.B to call even on a reserved vector and i do agree it can be useful in a (limited) but important set of cases.

Nicol Bolas

unread,

Jul 24, 2017, 8:34:45 PM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Monday, July 24, 2017 at 6:49:48 PM UTC-4, Arthur O'Dwyer wrote:

OK, so... what exactly would you change the wording to? The current wording is:

> A pointer such that `[data(), data() + size())` is a valid range. For a non-empty vector, `data() == addressof(front())`.

If the `vector` is empty, what does the pointer point to? What are you guaranteeing about that pointer? You can't guarantee that it points to a valid range, since there isn't one. And you can't just say that it points to something that will be a valid range, since... what exactly does that even mean?

So what is it pointing to? The internal allocation, cast to a `T*`?

Arthur O'Dwyer

unread,

Jul 24, 2017, 8:47:40 PM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Mon, Jul 24, 2017 at 5:34 PM, Nicol Bolas <jmck...@gmail.com> wrote:

On Monday, July 24, 2017 at 6:49:48 PM UTC-4, Arthur O'Dwyer wrote:

Your thread started off great — let's have v.data() do the intuitive thing for reserved vectors! [...]

However, I do think that what you originally asked for is useful in obscure cases, and doesn't cost us anything to standardize:

std::vector<char> v;
v.reserve(10);
char *p = v.data(); // currently UB; you proposed making this OK
v.resize(5);
assert(p == v.data()); // assert that the vector should not have reallocated

I have an example implementation here. (The example implementation is not surprising and matches all existing implementations I'm aware of; what you originally asked for was basically just a change in the wording to match what vendors already do.)

OK, so... what exactly would you change the wording to? The current wording is:

> A pointer such that `[data(), data() + size())` is a valid range. For a non-empty vector, `data() == addressof(front())`.

If the `vector` is empty, what does the pointer point to? What are you guaranteeing about that pointer? You can't guarantee that it points to a valid range, since there isn't one. And you can't just say that it points to something that will be a valid range, since... what exactly does that even mean?

So what is it pointing to? The internal allocation, cast to a `T*`?

Yes, exactly; that's what it points to.

The only question is, how do we express that real-world requirement opaquely and obscurely enough to satisfy the Committee? And that's a question that would require a real proposal in order to answer. I may or may not write one.

A strawman wording to start out with might be:

A pointer such that `[data(), data() + size())` is the range of data elements assigned to the vector. For a non-empty vector, `data() == addressof(front())`. Reallocation (such as by a call to `reserve()`) invalidates the pointer returned by a previous call to `data()`; as long as no reallocation happens, the pointer remains valid.

By logical deduction on this newly added wording, we would be able to conclude that the pointer returned from data() prior to a non-reallocating push_back() must be the same pointer as would have been returned after the push_back(), and thus that the pointer returned by data() must necessarily point to the address of front() even when front() hasn't yet been constructed. But we avoid saying so explicitly, and thus preserve the sacred mystery of std::vector against those philistines who would callously reduce it to a simple dynamically allocated array. :)

–Arthur

stevem...@gmail.com

unread,

Jul 24, 2017, 9:19:16 PM7/24/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

I think this is a little too strongly worded. It doesn't let .data() return an invalid pointer when the vector is empty() and it's capacity() is zero. I think it should be something like this:

23.3.6.4 [vector.data]

| Returns: A pointer such that [data(),data() + size()) is a valid
| range. For a non-empty vector, data() == &front().

To

23.3.6.4 [vector.data]

| Returns: A pointer such that [data(),data() + size()) is a valid
| range. For a non-empty vector, data() == &front(). For a vector of capacity > 0 upon subsequent calls this pointer may not change unless a reallocation occurs; if a reallocation occurs (such as by a call to 'reserve()') this pointer may be invalidated and a new one returned. It is undefined what happens if this pointer is dereferenced until .size() > 0.

This guarantees that when non-empty the pointer is to the front. And that when someone has first called reserve but the vector is empty (capacity() > 0 && size() == 0) that this pointer must point to the same place that the next push_back will. We also state that this pointer cannot be dereferenced because it obviously doesn't make sense to allow someone to access possibly garbage data. But this should be enough to do any pointer logic to check for re-allocations, or store the pointer for use later until it's backed by valid data.

We want to give freedom for implementations when the capacity() is == 0;

Barry Revzin

unread,

Jul 25, 2017, 10:53:24 AM7/25/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

So what is it pointing to? The internal allocation, cast to a `T*`?

Yes, exactly; that's what it points to.
The only question is, how do we express that real-world requirement opaquely and obscurely enough to satisfy the Committee?

Honestly, what was the possible point of that phrasing?

But we avoid saying so explicitly, and thus preserve the sacred mystery of std::vector against those philistines who would callously reduce it to a simple dynamically allocated array. :)

Charming.

Hyman Rosen

unread,

Jul 25, 2017, 12:02:26 PM7/25/17

to std-pr...@isocpp.org, stevem...@gmail.com

On Tue, Jul 25, 2017 at 10:53 AM, Barry Revzin <barry....@gmail.com> wrote:

Honestly, what was the possible point of that phrasing?

Part of the frustration of having the library in the Standard is the attempt to specify behavior without reference to implementation.

Edward Catmur

unread,

Jul 25, 2017, 7:27:42 PM7/25/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

The pointer has to be valid when capacity() == 0, in the sense that it can participate in arithmetic operations with an offset of 0, since [data(), data() + size()) is a valid range even when size() and capacity() are both 0. I agree though that it would be best to avoid the term "valid" as much as possible, lest it give the impression that the pointer can be indirected or used in a placement new expression. Experience notwithstanding, it is surely unnecessary for the Standard to labor the point that the contiguous range [data(), data() + size()) is all that can be formed.

To avoid a double negative, I would suggest "The value returned by data() is changed only by operations that cause reallocation. [Note: operations that insert and erase elements change the extent of the valid range [data(), data() + size()) but not its start. -- end note]"

It would also be necessary to state that swap() exchanges data() along with capacity() and the elements.

I see one possible change in behavior: currently it is permissible for an implementation given a call to reserve() on an empty vector to defer any allocation to the first insert (and then allocate the full amount required by capacity()); this would no longer be possible (since data() is noexcept).

Nevin Liber

unread,

Jul 25, 2017, 10:59:34 PM7/25/17

to std-pr...@isocpp.org

On Tue, Jul 25, 2017 at 6:27 PM, Edward Catmur <e...@catmur.co.uk> wrote:

I see one possible change in behavior: currently it is permissible for an implementation given a call to reserve() on an empty vector to defer any allocation to the first insert (and then allocate the full amount required by capacity()); this would no longer be possible (since data() is noexcept).

How exactly is that possible, given that the post condition for reserve(n) is capacity() >= n? If you defer the allocation, how can vector guarantee the allocation succeeds, especially since the allocation takes place in the allocator and not by vector itself?

Nicol Bolas

unread,

Jul 25, 2017, 11:30:20 PM7/25/17

to ISO C++ Standard - Future Proposals

On Tuesday, July 25, 2017 at 10:59:34 PM UTC-4, Nevin ":-)" Liber wrote:

On Tue, Jul 25, 2017 at 6:27 PM, Edward Catmur <e...@catmur.co.uk> wrote:

I see one possible change in behavior: currently it is permissible for an implementation given a call to reserve() on an empty vector to defer any allocation to the first insert (and then allocate the full amount required by capacity()); this would no longer be possible (since data() is noexcept).

How exactly is that possible, given that the post condition for reserve(n) is capacity() >= n?

Simple: you return the new capacity, but you didn't allocate any of the memory behind it yet.

If you defer the allocation, how can vector guarantee the allocation succeeds, especially since the allocation takes place in the allocator and not by vector itself?

That's the sticking point. If there is unused capacity in a `vector`, if `size()` < `capacity()`, then a vector is not allowed to fail due to allocation errors. So it's unclear how an implementation could implement this requirement and still not allocate upon `reserve` > `capacity`.

stevem...@gmail.com

unread,

Jul 26, 2017, 10:35:02 AM7/26/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

This pointer can be zero in the case where capacity is zero and still be valid range. I think it best to require that when the capacity > 0 this pointer must be a "real - non-zero" pointer that points to the same place that the first element will be placed at (i.e to the buffer). This would mean that allocations can no longer be deffered, but i don't see how that's an issue, i'd rather expect it to be a perk. Someone using reserve likely expects that to be the point where the allocation occurs.

My proposal of the wording still stands:

Edward Catmur

unread,

Jul 26, 2017, 1:19:44 PM7/26/17

to std-pr...@isocpp.org

Am I missing something? [vector.modifiers] just says that in case no reallocation occurs on push_back() all pointers, iterators and references to elements remain valid; an empty vector has no elements, so this is trivially true. Is this somewhere within the overall library, container or sequence container requirements?

Edward Catmur

unread,

Jul 26, 2017, 1:29:32 PM7/26/17

to std-pr...@isocpp.org

Yes, certainly. That means there's no need to restrict a data() stability clause to the case of nonzero capacity(). Your wording is equally workable if you strike the "capacity > 0" precondition.

I think it best to require that when the capacity > 0 this pointer must be a "real - non-zero" pointer that points to the same place that the first element will be placed at (i.e to the buffer). This would mean that allocations can no longer be deffered, but i don't see how that's an issue, i'd rather expect it to be a perk. Someone using reserve likely expects that to be the point where the allocation occurs.

From the point of view of a user, certainly; from the point of view of an implementer it's an additional constraint. For example, they would be precluded from deferring allocation to save resources if they have observed users calling reserve() unnecessarily on vectors that they do not then use. They would also be unable to diagnose such misuse of the API by checking on destruction whether reserve()d capacity has been used. Remember that the Standard is written for the benefit of implementers as much as for users.

Arthur O'Dwyer

unread,

Jul 26, 2017, 2:55:44 PM7/26/17

to ISO C++ Standard - Future Proposals

Under the wording for "reserve" itself, N4659 says: "No reallocation shall take place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the value of capacity()."

The Standard never exactly defines what it means by "reallocation" AFAIK, but my interpretation is that "reallocation" means "potentially make a call to std::allocator_traits<V::allocator_type>::allocate(...)." If allocate() can throw an exception or produce any other observable side effect (such as logging messages), then "reallocation" is the time when that exception or side effect must take place. Under my interpretation of the word "reallocation", it is strictly forbidden for that side effect to take place later, e.g. during push_back(), since at that time "no reallocation shall take place" according to N4659.

vector<int> v;

assert(v.size() == 0);

v.reserve(100);

assert(100 <= v.capacity());

assert(v.size() < v.capacity()); // because 0 < 100

v.emplace_back(); // "no reallocation shall take place" here

Therefore (under the above interpretation) it is not "permissible for an implementation given a call to reserve() on an empty vector to defer any allocation to the first insert".

I think this applies to plain old std::vector<T, std::allocator<T>> even on a sufficiently clever compiler, because std::allocator<T> is required to defer to ::operator new, and ::operator new might be overridden by the user (with a definition that produces observable side effects), and we can't possibly know that until link time. However, I admit that a reasonable person might take the wording in [allocator.members]/2 "the storage is obtained by calling ::operator new, but it is unspecified when or how often this function is called" to be evidence in favor of a sufficiently clever compiler actually being able to defer allocations until insert-time, for containers that use std::allocator<T>. For containers that use MyCustomAllocator<T>, the wording in [allocator.members]/2 is irrelevant and everything stands or falls on the interpretation of the word "reallocation" above.

Doing my part for the sacred mysteries,

–Arthur

Nicol Bolas

unread,

Jul 26, 2017, 4:08:34 PM7/26/17

to ISO C++ Standard - Future Proposals

On Wednesday, July 26, 2017 at 2:55:44 PM UTC-4, Arthur O'Dwyer wrote:

On Wed, Jul 26, 2017 at 10:19 AM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
On Wed, Jul 26, 2017 at 4:30 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Tuesday, July 25, 2017 at 10:59:34 PM UTC-4, Nevin ":-)" Liber wrote:
On Tue, Jul 25, 2017 at 6:27 PM, Edward Catmur <e...@catmur.co.uk> wrote:

I see one possible change in behavior: currently it is permissible for an implementation given a call to reserve() on an empty vector to defer any allocation to the first insert (and then allocate the full amount required by capacity()); this would no longer be possible (since data() is noexcept).

How exactly is that possible, given that the post condition for reserve(n) is capacity() >= n?

Simple: you return the new capacity, but you didn't allocate any of the memory behind it yet.

If you defer the allocation, how can vector guarantee the allocation succeeds, especially since the allocation takes place in the allocator and not by vector itself?

That's the sticking point. If there is unused capacity in a `vector`, if `size()` < `capacity()`, then a vector is not allowed to fail due to allocation errors. So it's unclear how an implementation could implement this requirement and still not allocate upon `reserve` > `capacity`.

Am I missing something? [vector.modifiers] just says that in case no reallocation occurs on push_back() all pointers, iterators and references to elements remain valid; an empty vector has no elements, so this is trivially true. Is this somewhere within the overall library, container or sequence container requirements?

Under the wording for "reserve" itself, N4659 says: "No reallocation shall take place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the value of capacity()."

The Standard never exactly defines what it means by "reallocation" AFAIK,

Actually, that might make for a really good defect report. The standard defines the effect of reallocation on iterators/et. al in several places, but it defines them all differently in those places. For example:

> Reallocation invalidates all the references, pointers, and iterators referring to the elements in the sequence.

> Reallocation invalidates all the references, pointers, and iterators referring to the elements in the sequence as well as the past-the-end iterator.

The first one is from `reserve`; the second is from `shrink_to_fit`. Notice the difference? Yes, the "past-the-end" iterator bit is not stated in `reserve`. By that reasoning, if you `reserve` elements, your past-the-end iterator should always be fine, but not if `shrink_to_fit` provokes reallocation.

So yes, the meaning of reallocation needs to be spelled out and centralized, so that there can be no further errors of this type. `basic_string` should likewise be adjusted.

Ross Smith

unread,

Jul 26, 2017, 4:46:59 PM7/26/17

to std-pr...@isocpp.org

On 2017-07-26 11:27, Edward Catmur wrote:
>
> To avoid a double negative, I would suggest "The value returned by
> data() is changed only by operations that cause reallocation. [Note:
> operations that insert and erase elements change the extent of the valid
> range [data(), data() + size()) but not its start. -- end note]"
>
> It would also be necessary to state that swap() exchanges data() along
> with capacity() and the elements.

It might also be necessary to specify the behaviour when moving a
vector, depending on what people decide is reasonable. If we write
vec2=std::move(vec1), do we expect the new vec2.data() to equal the old
vec1.data() always?

Ross Smith

Ross Smith

unread,

Jul 26, 2017, 4:51:49 PM7/26/17

to std-pr...@isocpp.org

On 2017-07-26 11:27, Edward Catmur wrote:
>

> To avoid a double negative, I would suggest "The value returned by
> data() is changed only by operations that cause reallocation. [Note:
> operations that insert and erase elements change the extent of the valid
> range [data(), data() + size()) but not its start. -- end note]"
>
> It would also be necessary to state that swap() exchanges data() along
> with capacity() and the elements.

It might also be necessary to specify the behaviour when moving a
vector, depending on what people decide is reasonable. If we write
vec2=std::move(vec1), do we expect the new vec2.data() to equal the old
vec1.data() always?

Ross Smith

(apologies if this shows up twice, had posting errors and I only see the
list via gmane so I can't tell if the first attempt reached the mailing
list)

Ville Voutilainen

unread,

Jul 26, 2017, 4:52:17 PM7/26/17

to ISO C++ Standard - Future Proposals

That depends on what the actual types of vec2 and vec1 are and whether
poc2 is const or not, since you need to take
POCCA and POCMA into account.

Ville Voutilainen

unread,

Jul 26, 2017, 4:53:12 PM7/26/17

to ISO C++ Standard - Future Proposals

And I mean vec2 instead of poc2. Once the mind wanders into the
POCCA/POCMA-land, funny
things happen.

Ville Voutilainen

unread,

Jul 26, 2017, 4:53:44 PM7/26/17

to ISO C++ Standard - Future Proposals

And to add insult to injury, I mean vec1, not vec2. :)

Dan Raviv

unread,

Jul 26, 2017, 5:20:01 PM7/26/17

to std-pr...@isocpp.org

So it's unclear whether reserve() is allowed to postpone the actual allocation?

I rely on it to do the actual allocation and defer it to later when preparing for real-time rendering code which must not do allocations.

I guess I should use resize() instead. Though, to maintain the "logical" interface of the vector, as if no elements have been inserted, that resize should be immediately followed by a resize(0). Pretty ugly.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/45d228c5-6b66-4426-a620-bcfc33f79c94%40isocpp.org.

Nevin Liber

unread,

Jul 26, 2017, 5:49:44 PM7/26/17

to std-pr...@isocpp.org

On Wed, Jul 26, 2017 at 4:19 PM, Dan Raviv <dan....@gmail.com> wrote:

So it's unclear whether reserve() is allowed to postpone the actual allocation?

I don't think it is unclear. If you think it is, file a defect report, because AFAIK that isn't intentional.

I rely on it to do the actual allocation and defer it to later

So do many others, including myself.

Ville Voutilainen

unread,

Jul 26, 2017, 6:02:34 PM7/26/17

to ISO C++ Standard - Future Proposals

On 27 July 2017 at 00:49, Nevin Liber <ne...@eviloverlord.com> wrote:
> On Wed, Jul 26, 2017 at 4:19 PM, Dan Raviv <dan....@gmail.com> wrote:
>>
>> So it's unclear whether reserve() is allowed to postpone the actual
>> allocation?
>
>
> I don't think it is unclear. If you think it is, file a defect report,
> because AFAIK that isn't intentional.

Don't waste our time with a nonsense "defect" report.
[vector.capacity]/6:

Bengt Gustafsson

unread,

Jul 26, 2017, 6:11:11 PM7/26/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

I agree with Nicol. This snippet he wrote is what we need, I don't see how the remainder of this thread can get around the fact that even if you could get a valid pointer after just doing reserve you can't resize after filling the buffer without overwriting it.

vector<int> v(10, std::default_init); memcpy(v.data(), ..., 10 * sizeof(int));

I think this the only thing we need: A way to get default construction instead of value construction. There are numerous cases where you want to call legacy functions which take a T* and let them fill your pre-sized vector but don't want the overhead of the value construction. This would solve the OPs problem, as he would always need to know the size before resizing anyway he can just as well resize first and then use operator[] instead of push_back after observing the allocation address.

This feature could be complemented by a rule that says that if_constructible_v<T, std::default_init> then the T(std::default_init) constructor is to be called by the newly added vector constructor rather than the T() constructor. This would allow certain performance critical types such as for instance a Point class to provide distinct value and default constructors (albeit with somewhat misleading names):

struct Point {

Point() : x(0), y(0) {} // value constructor

Point(std::default_init) {} // default constructor

Point(int _x, int _y) : x(_x), y(_y) {}

int x, y;

};

This would allow a vector of Points to be resized without zeroing all the x and y members. This is of course less commonly needed than buffers of built in types but seems to be a logical extension if a std::default_init constructor/resize signature is added to vector (and maybe other containers, at least deque).

This could be extended, with the help of core language features, to allow the same distinction for variables or at least new expressions:

T x(); // Unfortunately not a value constructed variable but a forward declared function...

T y; // default constructed

T z(std::default_init()); // default constructed

T* a = new T; // default constructed

T* b = new T(); // value constructed

T* c = new T[100]; // default constructed array

T* d = new T[100](); // New syntax required to get value construction

The problems I could find are highlighted in bold. Note however that there is no backward compatibility issue as there is currently no std::default_init type to use, so noone can have written a constructor with this type as parameter. Not being able to value construct variables when there is a default constructor specified for the type seems very restricting though. As we are already in the area of messing up language with std library we could add another magic type std::value_init and an operator new() overload to handle these cases:

T x(std::value_init()); // value constructed

T y; // default constructed

T z(std::default_init()); // default constructed

T* a = new T; // default constructed

T* b = new T(); // value constructed

T* c = new T[100]; // default constructed array

T* d = new(std::value_init()) T[100]; // New overloaded new to get value construction

Still annoying to have to write that much to inform the parser that you are not declaring a function x but notably this only happens if there is a separate value vs. default constructor. If there is no constructor taking std::default_init the T() constructor will be used in all cases.

If we can't find a better way to spell out the desire for value or default construction when defining variables I don't think this is reasonable to add and would suggest only the library feature, i.e. a new constructor/resize tagged to use default construction. I think it would also be good to check for the T(std::default_init) constructor but this is maybe less important. One drawback with not having this feature is that people may think it exists and write:

vector<Point> ps(100, std::default_init());

If there is no special handling this would still call T() ignoring the other constructor, and there would still be no way to set up a buffer of points without zeroing it.

This type of buffer handling is typical when interacting with libraries such as OpenGL through a C++ wrapper layer, where performance is often critical.

Nicol Bolas

unread,

Jul 26, 2017, 6:50:22 PM7/26/17

to ISO C++ Standard - Future Proposals

The point that's unclear is (oddly enough) whether "reallocation" is just about the behavior of iterators/references/pointers, or if it is also about actually allocating memory and all of the side-effects thereof which are not just iterators/references/pointers. The point is that the standard doesn't explicitly state that "no reallocation" means no allocating memory.

That's clearly the intent, but I'm not sure if it's well-specified.

But the thing I pointed out, with two sections defining "reallocation" differently, that's either a defect or an editorial issue.

Thiago Macieira

unread,

Jul 26, 2017, 9:50:20 PM7/26/17

to std-pr...@isocpp.org

On quarta-feira, 26 de julho de 2017 15:50:21 PDT Nicol Bolas wrote:
> The point that's unclear is (oddly enough) whether "reallocation" is *just*

> about the behavior of iterators/references/pointers, or if it is also about
> actually allocating memory and all of the side-effects thereof which are
> not just iterators/references/pointers. The point is that the standard

> doesn't *explicitly* state that "no reallocation" means no allocating
> memory.

I don't think the allocation is the issue. The issue is the lifetime begin of
the objects in the vector, even if trivial.

Suppose this was allowed:

std::vector<char> v;
v.reserve(1);
v.data()[0] = 'a';
v.resize(1);
return v[0];

What does this return?

Note that resize() will do
new (array[i]) T{};

Which will value-initialise (zero-initialise) the array, which means this
function returns 0.

Even if we had a resize() that did default intialisation, the compiler sees
this as the beginning of the lifetime of the char, which means the value that
was there can be discarded as a dead store. In this case, this function would
return an unspecified value and depending on it is UB.

So, tell me again: why does anyone want to access the vector's storage past
the end of the vector?

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

stevem...@gmail.com

unread,

Jul 26, 2017, 10:10:26 PM7/26/17

to ISO C++ Standard - Future Proposals

There are exactly 3 different things being discussed in this thread

1) Allowing .data() to return a pointer that can be used properly for pointer arithmatic, but have any dereference of it be undefined. I.E it points to the buffer, but that's it, if you use it U.B

2) Default init the vector instead of value init. This could be a 'real' solution to the next case (#3)

3) Using the .data() pointer to allow direct filling of the buffer, useful in cases where a C api is exposed. But this has been mentioned that the vector can't know of someone modifying it externally, so resize and things will overwrite changes you make and letting people do this is probably bad. (the solution was therefore to make dereferencing .data()'s pointer past .size() be U.B)

I initially proposed #1, after further discussion, i suggested we add #3 for performance reasons for C apis. There should be two seperate solutions in my mind for these. This thread is getting horribly off topic by bouncing between all three of these. Does anyone at least disagree with chaning the wording of .data() to allow it to be valid for pointer arithmatic, as long as the vector has a capacity > 0?

Nicol Bolas

unread,

Jul 26, 2017, 10:31:31 PM7/26/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Wednesday, July 26, 2017 at 10:10:26 PM UTC-4, stevem...@gmail.com wrote:

There are exactly 3 different things being discussed in this thread
1) Allowing .data() to return a pointer that can be used properly for pointer arithmatic, but have any dereference of it be undefined. I.E it points to the buffer, but that's it, if you use it U.B
2) Default init the vector instead of value init. This could be a 'real' solution to the next case (#3)
3) Using the .data() pointer to allow direct filling of the buffer, useful in cases where a C api is exposed. But this has been mentioned that the vector can't know of someone modifying it externally, so resize and things will overwrite changes you make and letting people do this is probably bad. (the solution was therefore to make dereferencing .data()'s pointer past .size() be U.B)

I initially proposed #1, after further discussion, i suggested we add #3 for performance reasons for C apis. There should be two seperate solutions in my mind for these.

There have been only two problems presented here:

1: Using the value of an empty `vector`'s pointer as some kind of marker, a pointer to hold until its filled in later.
2: Filling in the contents of a `vector` without having to initialize those contents twice.

These two problems have absolutely nothing to do with one another. As such, they require separate solutions.

Also, I have no idea how "make dereferencing .data()'s pointer past .size() be U.B" is a solution to anything. The whole point of this thread is to define behavior, not to make new undefined behavior.

This thread is getting horribly off topic by bouncing between all three of these. Does anyone at least disagree with chaning the wording of .data() to allow it to be valid for pointer arithmatic, as long as the vector has a capacity > 0?

What do you mean by "valid for pointer arithmatic[sic]"? Pointer arithmetic in C++ is (at present) only defined for arrays of actual live objects, which `vector` creates when you insert elements. Well, the return value of `data()` for an empty `vector` would (at best) be just a memory allocation; it's not an array of live objects. So you can't do pointer arithmetic on it.

To change that would require changing the meaning of pointer arithmetic, which is a highly complex change. And while I would like to see those rules loosened up, even I don't want to allow you to just arbitrarily perform pointer arithmetic through random memory.

Nicol Bolas

unread,

Jul 26, 2017, 10:34:28 PM7/26/17

to ISO C++ Standard - Future Proposals

On Wednesday, July 26, 2017 at 9:50:20 PM UTC-4, Thiago Macieira wrote:

On quarta-feira, 26 de julho de 2017 15:50:21 PDT Nicol Bolas wrote:
> The point that's unclear is (oddly enough) whether "reallocation" is *just*
> about the behavior of iterators/references/pointers, or if it is also about
> actually allocating memory and all of the side-effects thereof which are
> not just iterators/references/pointers. The point is that the standard
> doesn't *explicitly* state that "no reallocation" means no allocating
> memory.

I don't think the allocation is the issue. The issue is the lifetime begin of
the objects in the vector, even if trivial.

I think you've kind of gotten off the off-topic discussion. "The issue" I was discussion was in response to a post about a possible standard defect in how "reallocation" is defined.

We're pretty much all in agreement that there is no functional way to allow you to just pretend that elements of a `vector` exist through `reserve` without making them actually exist through `insertion`/`resize`/etc.

Thiago Macieira

unread,

Jul 26, 2017, 10:46:53 PM7/26/17

to std-pr...@isocpp.org

On quarta-feira, 26 de julho de 2017 19:10:25 PDT stevem...@gmail.com

> 1) Allowing .data() to return a pointer that can be used properly for
> pointer arithmatic, but have any dereference of it be undefined. I.E it
> points to the buffer, but that's it, if you use it U.B

Fair enough, you can do pointer arithmetic on it up to capacity().

What's the use-case? What are you going to do with those different pointers
that you can't dereference?

> 2) Default init the vector instead of value init. This could be a 'real'
> solution to the next case (#3)
> 3) Using the .data() pointer to allow direct filling of the buffer, useful
> in cases where a C api is exposed. But this has been mentioned that the
> vector can't know of someone modifying it externally, so resize and things
> will overwrite changes you make and letting people do this is probably bad.
> (the solution was therefore to make dereferencing .data()'s pointer past
> .size() be U.B)

Agreed. So what you want is to resize() it with default initialisation (don't
memset), then pass it to the C library function.

I wrote this for QVector (resizeUninitialized), but had to remove it because
the enable_if I had broke for forward-declared element types.

> I initially proposed #1, after further discussion, i suggested we add #3
> for performance reasons for C apis. There should be two seperate solutions
> in my mind for these. This thread is getting horribly off topic by bouncing
> between all three of these. Does anyone at least disagree with chaning the
> wording of .data() to allow it to be valid for pointer arithmatic, as long
> as the vector has a capacity > 0?

Sure, I just don't see why you'd want to.

Thiago Macieira

unread,

Jul 26, 2017, 10:49:08 PM7/26/17

to std-pr...@isocpp.org

On quarta-feira, 26 de julho de 2017 19:31:30 PDT Nicol Bolas wrote:
> What do you mean by "valid for pointer arithmatic[sic]"? Pointer arithmetic
> in C++ is (at present) only defined for arrays of actual live objects,
> which `vector` creates when you insert elements.

You can do pointer arithmetic on the pointer returned by malloc(), operator
new() and the allocators up to the size you allocated (plus one). Initialising
the objects in that storage is not required.

In fact, arithmetic on those pointers is a requirement to start the lifetime
of the objects in the first place.

Thiago Macieira

unread,

Jul 26, 2017, 10:50:08 PM7/26/17

to std-pr...@isocpp.org

On quarta-feira, 26 de julho de 2017 19:34:27 PDT Nicol Bolas wrote:
> I think you've kind of gotten off the off-topic discussion. "The issue" I
> was discussion was in response to a post about a possible standard defect
> in how "reallocation" is defined.
>
> We're pretty much all in agreement that there is no functional way to allow
> you to just pretend that elements of a `vector` exist through `reserve`
> without making them actually exist through `insertion`/`resize`/etc.

Yeah, I kind of got in the middle. I saw the discussion going off-topic to the
original problem and replied to the latest email trying to bring it back.

stevem...@gmail.com

unread,

Jul 26, 2017, 11:05:06 PM7/26/17

to ISO C++ Standard - Future Proposals

As nicol states problem 1) Using the value of an empty `vector`'s pointer as some kind of marker, a pointer to hold until its filled in later.

There's two immediate cases i can think of

1) the functions makeJmpInst and makeCallInst need to be given a pointer to where the instruction will be in memory. Their encoding depends upon their memory location. I.E you cannot put them into the vector, without first knowing where they will be put at.

std::vector<uint8_t> buf;
buf.reserve(32);
uint8_t* raw_buf = buf.data();
buf.push_back(makeJmpInst(raw_buf).bytes());
buf.push_back(makeCallInst(raw_buf).bytes());

2) gives you a way to check if the buffer was re-allocated

std::vector<uint8_t> buf;
buf.reserve(32);
uint8_t* raw_buf = buf.data();
buf.push_back({1,2,3});
if(buf.data() !=raw_buf)
   error("oh no we got moved on push_back");

Where is it stated that a pointer has to point to a valid object, can it not just point to memory. A call to reserve is absolutely going to put an array of bytes into the underlying vector. What am i misunderstanding that dis-allows you to cast this to a T* and let me to pointer arithmetic. If it's an issue with T*, just give me a char*.

As Nicol states problem 2) Filling in the contents of a `vector` without having to initialize those contents twice.

Imagine any C api ever that look like this

int fillSomeBuffer(T* buf); //returns the number of T's actually put into buf

As it is you have to do this to not invoke U.B or overwrite elements

std::vector<uint8_t> buf;
buf.reserve(32);
uint8_t cBuf[32];
int count = fillSomebuffer(cBuf);
buf.insert(cBuf, cBuf + count);

Ew, look at how horrible that is. Surely we can do better.

That's two real-world cases why something needs to change. I'm not a standardese guy i don't know all the necessary nuances to get this done properly. I simply ask that others help guide me to get a solution to these.

Nicol Bolas

unread,

Jul 26, 2017, 11:22:56 PM7/26/17

to ISO C++ Standard - Future Proposals

On Wednesday, July 26, 2017 at 10:49:08 PM UTC-4, Thiago Macieira wrote:

On quarta-feira, 26 de julho de 2017 19:31:30 PDT Nicol Bolas wrote:
> What do you mean by "valid for pointer arithmatic[sic]"? Pointer arithmetic
> in C++ is (at present) only defined for arrays of actual live objects,
> which `vector` creates when you insert elements.

You can do pointer arithmetic on the pointer returned by malloc(), operator
new() and the allocators up to the size you allocated (plus one). Initialising
the objects in that storage is not required.

In fact, arithmetic on those pointers is a requirement to start the lifetime
of the objects in the first place.

[expr.add]/4 states:

> When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, 86 the expressions P + J and J + P (where J has the value j ) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n ; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.

If P is a freshly allocated piece of memory, it does not "point to element x[i] of an array object x with n elements". Therefore "the behavior is undefined".

Yes, this is powerfully dumb. Yes, this means it is impossible to implement `std::vector`. Yes, this is actually how the standard works at present.

Nicol Bolas

unread,

Jul 26, 2017, 11:36:34 PM7/26/17

to ISO C++ Standard - Future Proposals, stevem...@gmail.com

On Wednesday, July 26, 2017 at 11:05:06 PM UTC-4, stevem...@gmail.com wrote:

On Wednesday, July 26, 2017 at 10:49:08 PM UTC-4, Thiago Macieira wrote:
On quarta-feira, 26 de julho de 2017 19:31:30 PDT Nicol Bolas wrote:
> What do you mean by "valid for pointer arithmatic[sic]"? Pointer arithmetic
> in C++ is (at present) only defined for arrays of actual live objects,
> which `vector` creates when you insert elements.

You can do pointer arithmetic on the pointer returned by malloc(), operator
new() and the allocators up to the size you allocated (plus one). Initialising
the objects in that storage is not required.

In fact, arithmetic on those pointers is a requirement to start the lifetime
of the objects in the first place.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

As nicol states problem 1) Using the value of an empty `vector`'s pointer as some kind of marker, a pointer to hold until its filled in later.

There's two immediate cases i can think of

1) the functions makeJmpInst and makeCallInst need to be given a pointer to where the instruction will be in memory. Their encoding depends upon their memory location. I.E you cannot put them into the vector, without first knowing where they will be put at.
std::vector<uint8_t> buf; buf.reserve(32); uint8_t* raw_buf = buf.data(); buf.push_back(makeJmpInst(raw_buf).bytes()); buf.push_back(makeCallInst(raw_buf).bytes());

So, it would be impossible to patch up the data with its actual memory address after inserting it?

2) gives you a way to check if the buffer was re-allocated

No, it does not. Because if two reallocations happened, it is entirely possible that `vector.data()` has the same value it used to have. Or something as simple as this:

vector<int> vec1 = ...
auto ptr = vec1.data();
vec1.push_back(); //Cause reallocation. `ptr` is invallid.
vector<int> vec2 = ... //Just so happens to have the `ptr` value.
vec1 = std::move(vec2);
ptr == vec1.data(); //Reallocation happened, but we can't tell.

Where is it stated that a pointer has to point to a valid object, can it not just point to memory.

That isn't stated anywhere. But, nobody has suggested that this is a problem. You can have pointers to random memory. But you can't do pointer arithmetic on it.

A call to reserve is absolutely going to put an array of bytes into the underlying vector. What am i misunderstanding that dis-allows you to cast this to a T* and let me to pointer arithmetic.

See [expr.add]/4. Pointer arithmetic does not work unless it's actually a pointer into an array of `T`s.

If it's an issue with T*, just give me a char*.

That doesn't work either, since it's not a pointer into an array of `char`.

As Nicol states problem 2) Filling in the contents of a `vector` without having to initialize those contents twice.

...

That's two real-world cases why something needs to change. I'm not a standardese guy i don't know all the necessary nuances to get this done properly. I simply ask that others help guide me to get a solution to these.

And we already have: permit default initialization of objects. But you rejected it with the reasoning "I'm weighing the probability of this actually happening. It seems like doing something like that would come around in C++ 30, while this could be done tomorrow."

If you reject the solution that can actually work in favor of one that can't work, there's not much we can really do.

Thiago Macieira

unread,

Jul 27, 2017, 12:05:56 AM7/27/17

to std-pr...@isocpp.org

On quarta-feira, 26 de julho de 2017 20:36:33 PDT Nicol Bolas wrote:
> > 1) the functions makeJmpInst and makeCallInst need to be given a pointer
> > to where the instruction will be in memory. Their encoding depends upon
> > their memory location. I.E you cannot put them into the vector, without
> > first knowing where they will be put at.
> > std::vector<uint8_t> buf;
> > buf.reserve(32);
> > uint8_t* raw_buf = buf.data();
> > buf.push_back(makeJmpInst(raw_buf).bytes());
> > buf.push_back(makeCallInst(raw_buf).bytes());
>
> So, it would be impossible to patch up the data with its actual memory
> address after inserting it?

I might point out hat "patch up the data" is the state of the art in this
technology known as "linking"...

Edward Catmur

unread,

Jul 27, 2017, 10:47:59 AM7/27/17

to std-pr...@isocpp.org

It's not that clear to me (and I don't think I'm being intentionally obtuse). I'd hope implementations have considerable latitude over when they call into the supplied allocator, allowing them to e.g. pool nodes or node allocations if they so wish. Some operations are noexcept, meaning that they can't allocate or deallocate, but otherwise I think it should be the implementor's choice.

There are some sections (e.g. [stringbuf.virtuals]/8) where "reallocate" is clearly referring to calls to an allocator, but in the case of vector (and string) it could be read as a shorthand term for iterator/reference/pointer invalidation.

Hyman Rosen

unread,

Jul 27, 2017, 10:52:12 AM7/27/17

to std-pr...@isocpp.org

On Wed, Jul 26, 2017 at 11:05 PM, <stevem...@gmail.com> wrote:

2) gives you a way to check if the buffer was re-allocated
std::vector<uint8_t> buf; buf.reserve(32); uint8_t* raw_buf = buf.data(); buf.push_back({1,2,3}); if(buf.data() !=raw_buf) error("oh no we got moved on push_back");

No, this is no good. If the pointer in raw_buf has been deallocated, it is undefined behavior to read its value.
You could use memcmp rather than !=, though.

Ville Voutilainen

unread,

Jul 27, 2017, 10:53:13 AM7/27/17

to ISO C++ Standard - Future Proposals

On 27 July 2017 at 17:47, 'Edward Catmur' via ISO C++ Standard -

Future Proposals <std-pr...@isocpp.org> wrote:
>> The point that's unclear is (oddly enough) whether "reallocation" is just
>> about the behavior of iterators/references/pointers, or if it is also about
>> actually allocating memory and all of the side-effects thereof which are not
>> just iterators/references/pointers. The point is that the standard doesn't
>> explicitly state that "no reallocation" means no allocating memory.
>>
>> That's clearly the intent, but I'm not sure if it's well-specified.
>
>
> It's not that clear to me (and I don't think I'm being intentionally
> obtuse). I'd hope implementations have considerable latitude over when they
> call into the supplied allocator, allowing them to e.g. pool nodes or node
> allocations if they so wish. Some operations are noexcept, meaning that they
> can't allocate or deallocate, but otherwise I think it should be the
> implementor's choice.

What 'nodes' is a vector supposed to use? If you think an actual
node-based container should have
liberties with when and how many nodes it allocates, that's fine and
certainly worth exploring, but
I don't see what that has to do with vector.

Ville Voutilainen

unread,

Jul 27, 2017, 10:54:35 AM7/27/17

to ISO C++ Standard - Future Proposals

Incorrect. " Indirection through an invalid pointer value and passing
an invalid pointer value to a deallocation
function have undefined behavior. Any other use of an invalid pointer
value has implementation-defined behavior."

Edward Catmur

unread,

Jul 27, 2017, 10:56:10 AM7/27/17

to std-pr...@isocpp.org

On Thu, Jul 27, 2017 at 3:31 AM, Nicol Bolas <jmck...@gmail.com> wrote:

On Wednesday, July 26, 2017 at 10:10:26 PM UTC-4, stevem...@gmail.com wrote:
There are exactly 3 different things being discussed in this thread
1) Allowing .data() to return a pointer that can be used properly for pointer arithmatic, but have any dereference of it be undefined. I.E it points to the buffer, but that's it, if you use it U.B
2) Default init the vector instead of value init. This could be a 'real' solution to the next case (#3)
3) Using the .data() pointer to allow direct filling of the buffer, useful in cases where a C api is exposed. But this has been mentioned that the vector can't know of someone modifying it externally, so resize and things will overwrite changes you make and letting people do this is probably bad. (the solution was therefore to make dereferencing .data()'s pointer past .size() be U.B)

I initially proposed #1, after further discussion, i suggested we add #3 for performance reasons for C apis. There should be two seperate solutions in my mind for these.

There have been only two problems presented here:

1: Using the value of an empty `vector`'s pointer as some kind of marker, a pointer to hold until its filled in later.
2: Filling in the contents of a `vector` without having to initialize those contents twice.

These two problems have absolutely nothing to do with one another. As such, they require separate solutions.

Definitely.

Also, I have no idea how "make dereferencing .data()'s pointer past .size() be U.B" is a solution to anything. The whole point of this thread is to define behavior, not to make new undefined behavior.

That's already UB (by implication, since [data(), data() + size) is a valid range). I don't think there's any reason to think making data() stable between reallocations should affect that.

This thread is getting horribly off topic by bouncing between all three of these. Does anyone at least disagree with chaning the wording of .data() to allow it to be valid for pointer arithmatic, as long as the vector has a capacity > 0?

What do you mean by "valid for pointer arithmatic[sic]"? Pointer arithmetic in C++ is (at present) only defined for arrays of actual live objects, which `vector` creates when you insert elements. Well, the return value of `data()` for an empty `vector` would (at best) be just a memory allocation; it's not an array of live objects. So you can't do pointer arithmetic on it.

At present, you can add 0 to it (and, I think, subtract 0 from it), which is a degenerate form of arithmetic. You can also subtract it from itself, giving 0.

It would not be necessary for the limits of pointer arithmetic to change to give a stability guarantee; you would have a data() pointer that you can still only use in degenerate pointer arithmetic, but one that is guaranteed to remain stable across calls to data(). I'm mostly in favor of such a guarantee, but since it would restrict implementor freedom you should consider the views of implementors.

After all, there must be *some* reason that implementors do not already choose to give you data() pointer stability.

Edward Catmur

unread,

Jul 27, 2017, 11:01:25 AM7/27/17

to std-pr...@isocpp.org

That was an analogy, I suppose, to argue that taking liberties over when to call into the allocator can be desirable.

Thiago Macieira

unread,

Jul 27, 2017, 12:01:14 PM7/27/17

to std-pr...@isocpp.org

That was the point of this entire discussion: you can use the .data() pointer
*because* you know it's valid. You can't dereference it yet because the data
there has not begun its lifetime, though.

Hyman Rosen

unread,

Jul 27, 2017, 12:04:23 PM7/27/17

to std-pr...@isocpp.org

On Thu, Jul 27, 2017 at 10:54 AM, Ville Voutilainen <ville.vo...@gmail.com> wrote:

Incorrect. " Indirection through an invalid pointer value and passing
an invalid pointer value to a deallocation
function have undefined behavior. Any other use of an invalid pointer
value has implementation-defined behavior."

"36) Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault."
The standard doesn't say what a "runtime fault" is - this is the only use of the word "fault" in the standard.
The typical case would be that such a fault crashes the program, so what are the semantics of the sample code?

Hyman Rosen

unread,

Jul 27, 2017, 12:08:32 PM7/27/17

to std-pr...@isocpp.org

On Thu, Jul 27, 2017 at 12:01 PM, Thiago Macieira <thi...@macieira.org> wrote:

That was the point of this entire discussion: you can use the .data() pointer
*because* you know it's valid. You can't dereference it yet because the data
there has not begun its lifetime, though.

In the posted code, someone was trying to check whether reallocation had occurred by doing

void *old_pointer = v.data();
// Do a bunch of stuff
if (v.data() == old_pointer) { /* no reallocation has occurred */ }

But if reallocation had occurred, old_pointer might hold an invalid pointer value, and trying
to read it could cause a "system-generated runtime fault".

Thiago Macieira

unread,

Jul 27, 2017, 12:17:16 PM7/27/17

to std-pr...@isocpp.org

On quinta-feira, 27 de julho de 2017 09:08:09 PDT Hyman Rosen wrote:
> In the posted code, someone was trying to check whether reallocation had
> occurred by doing
>
> void *old_pointer = v.data();
> // Do a bunch of stuff
> if (v.data() == old_pointer) { /* no reallocation has occurred */ }
>
> But if reallocation had occurred, old_pointer might hold an invalid pointer
> value, and trying
> to read it could cause a "system-generated runtime fault".

Right, you can't do that.

This could happen on protected-mode 16- and 32-bit x86 with FAR pointers. The
upper 16 bits of the pointer are the segment descriptor. But you cannot load
them into a segment register if the LDT or GDT lacks an entry for that
segment: a GPF occurs if you do.

The compiler could implement the above comparison without loading into the
registers, just performing a plain 32- or 48-bit memory comparison. Or it
could load them, you don't know.

Hence, implementation-defined.

Conclusion: you cannot portably determine if the old pointer is the same as
the new one because you can't use the old pointer in the first place.
(But everyone does it)

Hyman Rosen

unread,

Jul 27, 2017, 12:29:43 PM7/27/17

to std-pr...@isocpp.org

On Thu, Jul 27, 2017 at 12:16 PM, Thiago Macieira <thi...@macieira.org> wrote:

Conclusion: you cannot portably determine if the old pointer is the same as
the new one because you can't use the old pointer in the first place.
(But everyone does it)

You could use memcmp to compare the pointers byte by byte.

Viacheslav Usov

unread,

Jul 27, 2017, 12:30:39 PM7/27/17

to ISO C++ Standard - Future Proposals

On Thu, Jul 27, 2017 at 6:01 PM, Thiago Macieira <thi...@macieira.org> wrote:

> That was the point of this entire discussion: you can use the .data() pointer *because* you know it's valid. You can't dereference it yet because the data there has not begun its lifetime, though.

If we "know" the pointer is valid, and the data type has vacuous initialisation (e.g., a char), then you can dereference the pointer, since the lifetime has started. See [basic.life].

Cheers,

V.

Thiago Macieira

unread,

Jul 27, 2017, 12:37:27 PM7/27/17

to std-pr...@isocpp.org

Bitwise comparison success is not necessary for equality. Now think of real-
mode x86, where you have 32 bits in a FAR pointer, but only 20 of which
determine the actual address in the low megabyte of RAM.

I'm not sure we can even say bitwise comparison success is sufficient: is it
possible that two pointers are bitwise equal but not really equal?

Thiago Macieira

unread,

Jul 27, 2017, 12:38:07 PM7/27/17

to std-pr...@isocpp.org

On Thursday, 27 July 2017 09:30:36 PDT Viacheslav Usov wrote:
> If we "know" the pointer is valid, and the data type has vacuous
> initialisation (e.g., a char), then you can dereference the pointer, since
> the lifetime has started. See [basic.life].

Indeed.

But if vector new()s again, then a new lifetime starts and the compiler could
do dead-store elimination on everything that happened before.

Nicol Bolas

unread,

Jul 27, 2017, 12:41:35 PM7/27/17

to ISO C++ Standard - Future Proposals

No, it has not. The lifetime of an object cannot be started until the object has been created, and an object is only created through the means defined in [intro.object]/1. And writing to a piece of memory is not listed there.

"Vacuous initialization" is not a free pass from object creation. It just means that the object doesn't have to be initialized; it still has to be created.

Patrice Roy

unread,

Jul 27, 2017, 1:03:39 PM7/27/17

to std-pr...@isocpp.org

It might be interesting for someone to submit a paper for SG12 to look at. A case of "you cannot portably do this (but everyone does it)" sounds right up that study group's alley.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2233516.vx36bHqgMl%40tjmaciei-mobl1.

Arthur O'Dwyer

unread,

Jul 27, 2017, 1:09:44 PM7/27/17

to ISO C++ Standard - Future Proposals

On Wed, Jul 26, 2017 at 8:05 PM, <stevem...@gmail.com> wrote:

As nicol states problem 1) Using the value of an empty `vector`'s pointer as some kind of marker, a pointer to hold until its filled in later.

There's two immediate cases i can think of [...]
[...] gives you a way to check if the buffer was re-allocated

std::vector<uint8_t> buf; buf.reserve(32); uint8_t* raw_buf = buf.data(); buf.push_back({1,2,3}); if(buf.data() !=raw_buf) error("oh no we got moved on push_back");

FYI, no, if the buffer did get reallocated then this "check" would have undefined behavior. This is analogous to "checking for signed integer overflow" after you've already done the addition: it doesn't work. BUT, if buf.data() were guaranteed to return a stable value even for an empty (zero-size + nonzero-capacity) vector, then you wouldn't have to "check" at all; you'd just be guaranteed that buf.data() == raw_buf.

Here's an example of the sort of thing you'd be able to do with a data() pointer that was stable between reallocations:

int *gp = nullptr;

extern "C" void c_api_remember(int *p) { gp = p; }

extern "C" void c_api_use(int i) { printf("%d\n", gp[i]); }

int main() {

std::vector<int> buf;

buf.reserve(100);

c_api_remember(buf.data()); // works in practice but isn't blessed by the standard

for (int i=0; i < 100; ++i) {

buf.push_back(i);

c_api_use(i);

}

Where is it stated that a pointer has to point to a valid object, can it not just point to memory.

(This line of questioning is a rabbit hole that you should not go down. Just let it drop.)

As Nicol states problem 2) Filling in the contents of a `vector` without having to initialize those contents twice.
Imagine any C api ever that look like this
int fillSomeBuffer(T* buf); //returns the number of T's actually put into buf

As it is you have to do this to not invoke U.B or overwrite elements
std::vector<uint8_t> buf; buf.reserve(32); uint8_t cBuf[32]; int count = fillSomebuffer(cBuf); buf.insert(cBuf, cBuf + count);

Ew, look at how horrible that is. Surely we can do better.

Well, in this case the obvious way to do better is

std::vector<uint8_t> buf(32);

buf.resize(fillSomebuffer(buf.data()));

Sure, this runs the risk of an insufficiently smart compiler initializing a few (<=32) extra bytes to all-zero-bits. But trust me, even if you use -O99 and a magic compiler from the future, your compiler still probably initializes 32 bytes to all-zeros at least a couple of times more often than it needs to, somewhere in your codebase. I really would not worry about the cost in nanoseconds because for all intents and purposes there is no cost.

However, if you really really want to avoid all-bits-zero-constructing the elements of a container, that's what the C++ allocator model is for.

https://wandbox.org/permlink/WS6x1g1rdSI6Iri7

This is the second time I've given you this link. Please click on it.

–Arthur

Howard Hinnant

unread,

Jul 27, 2017, 1:18:49 PM7/27/17

to std-pr...@isocpp.org

On Jul 27, 2017, at 12:16 PM, Thiago Macieira <thi...@macieira.org> wrote:
>
> On quinta-feira, 27 de julho de 2017 09:08:09 PDT Hyman Rosen wrote:
>> In the posted code, someone was trying to check whether reallocation had
>> occurred by doing
>>
>> void *old_pointer = v.data();
>> // Do a bunch of stuff
>> if (v.data() == old_pointer) { /* no reallocation has occurred */ }
>>
>> But if reallocation had occurred, old_pointer might hold an invalid pointer
>> value, and trying
>> to read it could cause a "system-generated runtime fault".
>
> Right, you can't do that.
>
> This could happen on protected-mode 16- and 32-bit x86 with FAR pointers. The
> upper 16 bits of the pointer are the segment descriptor. But you cannot load
> them into a segment register if the LDT or GDT lacks an entry for that
> segment: a GPF occurs if you do.
>
> The compiler could implement the above comparison without loading into the
> registers, just performing a plain 32- or 48-bit memory comparison. Or it
> could load them, you don't know.
>
> Hence, implementation-defined.
>
> Conclusion: you cannot portably determine if the old pointer is the same as
> the new one because you can't use the old pointer in the first place.
> (But everyone does it)

If Herb’s spaceship operator gets passed, you would be able to compare the pointers using that. (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0515r1.pdf) If for no other reason than a total ordering among pointers, we should pass P0515.

Howard

signature.asc

Ville Voutilainen

unread,

Jul 27, 2017, 1:21:56 PM7/27/17

to ISO C++ Standard - Future Proposals

On 27 July 2017 at 20:03, Patrice Roy <patr...@gmail.com> wrote:
> It might be interesting for someone to submit a paper for SG12 to look at. A
> case of "you cannot portably do this (but everyone does it)" sounds right up
> that study group's alley.

I doubt that. We know that the behavior is implementation-defined
because platforms where
it used to matter were once somewhat common, and out of concern that
they might become common again,
we haven't nailed that behavior down.

Ville Voutilainen

unread,

Jul 27, 2017, 1:22:59 PM7/27/17

to ISO C++ Standard - Future Proposals

On 27 July 2017 at 20:09, Arthur O'Dwyer <arthur....@gmail.com> wrote:
> On Wed, Jul 26, 2017 at 8:05 PM, <stevem...@gmail.com> wrote:
>>>
>>>
>> As nicol states problem 1) Using the value of an empty `vector`'s pointer
>> as some kind of marker, a pointer to hold until its filled in later.
>>
>> There's two immediate cases i can think of [...]
>> [...] gives you a way to check if the buffer was re-allocated
>> std::vector<uint8_t> buf;
>> buf.reserve(32);
>> uint8_t* raw_buf = buf.data();
>> buf.push_back({1,2,3});
>> if(buf.data() !=raw_buf)
>> error("oh no we got moved on push_back");
>
>
> FYI, no, if the buffer did get reallocated then this "check" would have
> undefined behavior.

This is, again, incorrect, so you might as well all stop repeating that claim.

Arthur O'Dwyer

unread,

Jul 27, 2017, 1:37:08 PM7/27/17

to ISO C++ Standard - Future Proposals

On Thu, Jul 27, 2017 at 10:03 AM, Patrice Roy <patr...@gmail.com> wrote:
> 2017-07-27 12:16 GMT-04:00 Thiago Macieira <thi...@macieira.org>:
>>
>> Conclusion: you cannot portably determine if the old pointer is the same as
>> the new one because you can't use the old pointer in the first place.
>> (But everyone does it)
>

> It might be interesting for someone to submit a paper for SG12 to look at.

> A case of "you cannot portably do this (but everyone does it)" sounds right

> up that study group's alley.

IMO, using a pointer's value after free() is a pretty well established source of exploitable UB in the past few years. (I mean "exploitable" in the sense that modern compilers exploit its optimization potential. Of course if you're using UB you might get some security vulnerabilities as well, but that's not my department.) See for example "Winner #2" here:
https://blog.regehr.org/archives/767
John Regehr will be one of the speakers at CppCon 2017. :)

On the related topic of "you cannot portably do this (but everyone does it)" — a.k.a. "std::vector is not implementable (but everyone implements it)" — also at this year's CppCon, Nicolai Josuttis has submitted a talk titled "Why push_back(), optional, and variant are broken and why std::launder() doesn't help". If the myriad rabbit-holes in this thread interest you, you'll probably be interested in that talk.

@Howard Hinnant: actually the "total order" on pointers provided by operator<=> (and also by the existing std::less<T*>) isn't good enough to make the OP's code work. What you need in order to make the use-after-free code work is a total order even on pointer-shaped bit-patterns that are no longer valid due to having been freed. I admit I can't read Herb's mind, but I am fairly confident that he is not intending to imply well-defined behavior for this code:

int *p = new int[10];

int *q = new int[10];

printf(%d\n", (int)(p != q)); // currently well-defined, prints "1"

delete [] p;

delete [] q;

printf("%d\n", (int)(p != q)); // print "1", or keep the current undefined behavior?

If we want to make this code well-defined, we'll have to go back and teach Clang that it's well-defined; see the URL linked above for an example of how Clang exploits this undefined behavior for optimization. (This is awkward but certainly not a killer; C++17 well-defined a ton of things that used to be exploitable UB, and there's nothing stopping us from continuing in that vein a little further.)

–Arthur

Nicol Bolas

unread,

Jul 27, 2017, 1:39:06 PM7/27/17

to ISO C++ Standard - Future Proposals

On Thursday, July 27, 2017 at 1:09:44 PM UTC-4, Arthur O'Dwyer wrote:

However, if you really really want to avoid all-bits-zero-constructing the elements of a container, that's what the C++ allocator model is for.

https://wandbox.org/permlink/WS6x1g1rdSI6Iri7

This is the second time I've given you this link. Please click on it.

I'm not sure what that's supposed to prove. Yes, if you write your own allocator, you can turn value initialization into default initialization. But:

1: This requires using your own allocator. So `vector`s provided by other APIs or whatever are non-workable with this.
2: It turns all value initialization into default initialization. So it's no longer possible to value initialize elements at all.

Nicol Bolas

unread,

Jul 27, 2017, 1:43:40 PM7/27/17

to ISO C++ Standard - Future Proposals

On Thursday, July 27, 2017 at 12:37:27 PM UTC-4, Thiago Macieira wrote:

On Thursday, 27 July 2017 09:29:20 PDT Hyman Rosen wrote:
> On Thu, Jul 27, 2017 at 12:16 PM, Thiago Macieira <thi...@macieira.org>
>
> wrote:
> > Conclusion: you cannot portably determine if the old pointer is the same
> > as
> > the new one because you can't use the old pointer in the first place.
> > (But everyone does it)
>
> You could use memcmp to compare the pointers byte by byte.

Bitwise comparison success is not necessary for equality. Now think of real-
mode x86, where you have 32 bits in a FAR pointer, but only 20 of which
determine the actual address in the low megabyte of RAM.

I'm not sure we can even say bitwise comparison success is sufficient: is it
possible that two pointers are bitwise equal but not really equal?

That's essentially irrelevant for this question because you *cannot* use pointer equality as a test to see if reallocation happened.

As I pointed out in one of my replies, if two reallocations happened, it is possible that the second new pointer is the same as the old one. Or if you reallocate and then move a new vector into it, they might be the same.

It is folly to even attempt to use pointer equality to test reallocation. It's far better to simply check the capacity (but even that isn't necessarily guaranteed, since you can shrink-to-fit or move from something).

Patrice Roy

unread,

Jul 27, 2017, 1:58:16 PM7/27/17

to std-pr...@isocpp.org

@Ville : Good point

@Arthur : I understand what you mean; still, if it's common practice, however misguided language-wise, it might deserve some scrutiny, including clear exposure of the expected use-cases and alternative strategies. It would require someone caring enough about the issue to put something down in writing, obviously (I don't, but I'd be willing to read such a paper, to better grasp the perspective of such a proposer).

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAFk2RUZFt0gWvPw-d4vbKj_%2Bu32%3DxtHZ6Uj9S67cxqtfgP5icQ%40mail.gmail.com.

Hyman Rosen

unread,

Jul 27, 2017, 1:58:20 PM7/27/17

to std-pr...@isocpp.org

On Thu, Jul 27, 2017 at 1:22 PM, Ville Voutilainen <ville.vo...@gmail.com> wrote:

> FYI, no, if the buffer did get reallocated then this "check" would have
> undefined behavior.

This is, again, incorrect, so you might as well all stop repeating that claim.

It was undefined behavior in C++03. There are still plenty of people using C++03 compilers (my own large employer included).
Now it's not undefined but it can cause a "system-generated runtime fault". That sounds like a distinction without a difference.

Nicol Bolas

unread,

Jul 27, 2017, 2:05:32 PM7/27/17

to ISO C++ Standard - Future Proposals

On Thursday, July 27, 2017 at 1:58:20 PM UTC-4, Hyman Rosen wrote:

On Thu, Jul 27, 2017 at 1:22 PM, Ville Voutilainen <ville.vo...@gmail.com> wrote:
> FYI, no, if the buffer did get reallocated then this "check" would have
> undefined behavior.

This is, again, incorrect, so you might as well all stop repeating that claim.

It was undefined behavior in C++03. There are still plenty of people using C++03 compilers (my own large employer included).

Sure, but we're talking about an extension to the existing standard. And that existing standard is C++14, soon to be replaced by C++17.

We can't make proposals relative to the behavior of C++03. So its behavior is rather irrelevant to this issue.

Now it's not undefined but it can cause a "system-generated runtime fault". That sounds like a distinction without a difference.

Oh, there's a distinction. Implementation-defined means that you can pick implementations which define it not to do things like cause faults.

Thiago Macieira

unread,

Jul 27, 2017, 2:17:12 PM7/27/17

to std-pr...@isocpp.org

On Thursday, 27 July 2017 09:41:34 PDT Nicol Bolas wrote:
> No, it has not. The lifetime of an object cannot be started until the
> object has been created, and an object is only created through the means
> defined in [intro.object]/1. And writing to a piece of memory is not listed
> there.
>
> "Vacuous initialization" is not a free pass from object creation. It just
> means that the object doesn't have to be initialized; it still has to be

> *created*.

You're probably right for the strict interpretation, but we've had this
discussion before.

There's nothing preventing malloc from doing new (ptr) char; on the bytes that
compose the buffer it returned before it returned it. So the compiler has to
assume that objects may have been created there if you try to read or write to
it.

It can do DSE if it sees you do new, in which case it knows that whatever was
there was a dead store.

Also, in that discussion, we mostly tended towards saying that all trivial
types should be assumed to have begun lifetime before now and will end later,
so as to support shared memory mechanisms, sockets, object allocation by C
libraries, and a number of other things that don't actually call "new" but do
imply there's an object there. But there has been no paper on the subject and
I don't think we reached a consensus -- only a trend.

Ville Voutilainen

unread,

Jul 27, 2017, 2:17:53 PM7/27/17

to ISO C++ Standard - Future Proposals

Well, now that you mentioned your employer, I hear you have a rather
more interesting mixture
of operating systems, compilers, and hardware than many of us. Does
reading an invalid pointer
cause such a fault on any systems that you have?

Thiago Macieira

unread,

Jul 27, 2017, 2:18:16 PM7/27/17

to std-pr...@isocpp.org

On Thursday, 27 July 2017 10:18:44 PDT Howard Hinnant wrote:
> If Herb’s spaceship operator gets passed, you would be able to compare the
> pointers using that.
> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0515r1.pdf) If
> for no other reason than a total ordering among pointers, we should pass
> P0515.

How does that work around the part of the text that Ville posted?

"Any other use of an invalid pointer value has implementation-defined
behavior."

Ville Voutilainen

unread,

Jul 27, 2017, 2:19:50 PM7/27/17

to ISO C++ Standard - Future Proposals

On 27 July 2017 at 21:17, Thiago Macieira <thi...@macieira.org> wrote:
> Also, in that discussion, we mostly tended towards saying that all trivial
> types should be assumed to have begun lifetime before now and will end later,
> so as to support shared memory mechanisms, sockets, object allocation by C
> libraries, and a number of other things that don't actually call "new" but do
> imply there's an object there. But there has been no paper on the subject and
> I don't think we reached a consensus -- only a trend.

*ahem* http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0593r0.html

Thiago Macieira

unread,

Jul 27, 2017, 2:20:38 PM7/27/17

to std-pr...@isocpp.org

On Thursday, 27 July 2017 10:43:40 PDT Nicol Bolas wrote:
> > Bitwise comparison success is not necessary for equality. Now think of
> > real-
> > mode x86, where you have 32 bits in a FAR pointer, but only 20 of which
> > determine the actual address in the low megabyte of RAM.
> >
> > I'm not sure we can even say bitwise comparison success is sufficient: is
> > it
> > possible that two pointers are bitwise equal but not really equal?
>
> That's essentially irrelevant for this question because you *cannot* use
> pointer equality as a test to see if reallocation happened.
>
> As I pointed out in one of my replies, if two reallocations happened, it is
> possible that the second new pointer is the same as the old one. Or if you
> reallocate and then move a new vector into it, they might be the same.

Sounds like a variant of the ABA problem. But the ABA problem is usually about
threads, where testing for pointer equality to assume nothing has happened.

Usually, for the result of realloc() we're interested in whether *relocation*
happened, not just reallocation. In the case of the OP: if the base pointer
addresses differ, fixups to the absolute addresses stored in the vector are
needed.

Thiago Macieira

unread,

Jul 27, 2017, 2:24:15 PM7/27/17

to std-pr...@isocpp.org

On Thursday, 27 July 2017 10:58:12 PDT Patrice Roy wrote:
> @Arthur : I understand what you mean; still, if it's common practice,
> however misguided language-wise, it might deserve some scrutiny, including
> clear exposure of the expected use-cases and alternative strategies. It
> would require someone caring enough about the issue to put something down
> in writing, obviously (I don't, but I'd be willing to read such a paper, to
> better grasp the perspective of such a proposer).

This will go back to the case of needing a replacement for the C library
realloc(), which we've had over and over and it doesn't seem WG14 is that
interested in helping us out.