Allowing uninitialized elements in std::vector

gonza...@gmail.com

unread,

Feb 11, 2014, 9:28:45 AM2/11/14

to std-pr...@isocpp.org

Hello world!

I ran into the following efficiency problem when writing a C++ wrapper around a C library that uses raw buffers for communication (the library in this case is MPI):

std::vector<T> wrapper_function() {

std::size_t size = native_function_get_size();

...; // allocate memory

T* data = ...; // get pointer to memory

native_function(data, size, ...)

... // return vector<T>(data);

}

Constraints on the problem: the C++ library user expects an interface using std::vector, while the C library exposes a "unique_ptr" like interface using the pair (T*, size).

IMHO:

- this problem is common and relevant when writing C++ wrappers around C libraries (see example above),

- it is desirable to provide thin, safe, and efficient wrapper around C libraries, and

- there is currently no good solution available.

To the authors best knowledge, the most efficient solution is:

std::vector<T> wrapper_function() {

std::size_t size = native_function_get_size();

std::vector<T> data_vector (size); // overhead: vector default constructs elements!

native_function(data_vector.data(), size, ...)

return data_vector;

}

It is worth to remark that any solution considered should play well with allocators.

Proposed solution: provide a constructor and a resize methods that do not perform initialization.

Bikeshed 1:

struct uninitialized_t {};

const constexpr uninitialized_t uninitialized {};

std::vector<T>(size, std::uninitialized)

std::vector<T>::resize(size, std::uninitialized)

The problem could then be solved as:

std::vector<T> wrapper_function() {

std::size_t size = native_function_get_size();

std::vector<T> data_vector (size, std::uninitialized);

native_function(data_vector.data(), size, ...)

return data_vector;

}

Bikeshed 2:

std::vector<T>::uninitialized_resize(size)

The problem could then be solved as:

std::vector<T> wrapper_function() {

std::size_t size = native_function_get_size();

std::vector<T> data_vector();

data_vector.uninitialized_resize(size);

native_function(data_vector.data(), size, ...)

return data_vector;

}

Anyhow, both alternatives allow developing efficient C++ wrappers of C libraries that maintain a native C++ interface, and discourage used of uninitialized memory by providing a verbose interface.

What do you think?

Nevin Liber

unread,

Feb 11, 2014, 10:53:09 AM2/11/14

to std-pr...@isocpp.org

On 11 February 2014 08:28, <gonza...@gmail.com> wrote:

It is worth to remark that any solution considered should play well with allocators.

The issue can already be solved with allocators. See <http://stackoverflow.com/questions/15097783/value-initialized-objects-in-c11-and-stdvector-constructor/15119665#15119665> for one such method.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (847) 691-1404

gonza...@gmail.com

unread,

Feb 11, 2014, 12:10:41 PM2/11/14

to std-pr...@isocpp.org

On Tuesday, February 11, 2014 4:53:09 PM UTC+1, Nevin ":-)" Liber wrote:

The issue can already be solved with allocators. See <http://stackoverflow.com/questions/15097783/value-initialized-objects-in-c11-and-stdvector-constructor/15119665#15119665> for one such method.

Thanks for mentioning this! However the proposed approach restricts the unsafe operations to the interface with C code and returns a safe wrapper. AFAIK changing the behavior at the allocator level means that the vector would _always_ do the unsafe thing (i.e. default initialization).

Nevin Liber

unread,

Feb 11, 2014, 12:32:00 PM2/11/14

to std-pr...@isocpp.org

On 11 February 2014 11:10, <gonza...@gmail.com> wrote:

On Tuesday, February 11, 2014 4:53:09 PM UTC+1, Nevin ":-)" Liber wrote:

The issue can already be solved with allocators. See <http://stackoverflow.com/questions/15097783/value-initialized-objects-in-c11-and-stdvector-constructor/15119665#15119665> for one such method.

Thanks for mentioning this! However the proposed approach restricts the unsafe operations to the interface with C code and returns a safe wrapper.

Uninitialized PODs are not unsafe.

AFAIK changing the behavior at the allocator level means that the vector would _always_ do the unsafe thing (i.e. default initialization).

Do you have an actual use case where you need to have both uninitialized and initialized PODs and cannot use something like v.resize(100, V::value_type())?

The advantages of using an allocator is that (a) you can use it right now and (b) it can be added independently to the standard library, instead of (i) waiting at least 4 years and (ii) revamping allocators and containers just to support a very tiny use case.

And if you can afford to wait that long, the argument that you really need the performance just doesn't hold much weight.

Bengt Gustafsson

unread,

Feb 11, 2014, 1:52:18 PM2/11/14

to std-pr...@isocpp.org

Note that those vectors with the special allocator are not the same type as "normal" vectors so you can't make a "vector based api" and use another allocator inside the function, without having to copy the data one more time.

I have suggested a trait which can be used to specify that data from a certain pair of allocators can be interchanged (i.e. the actual allocation is the same, it is the side task of allocator, initialization, which differs).

Also, it would be impssible to get any level of thread safety for uninitialized memory in the vector.

However, it would be very simple to include a resize_default_constructed() method on vector which calls the default constructor on the elements instead of the value constructor that resize() calls. This only differs for built in types and pointers.

This suggestion has been met with nothing but scorn by the majority on this list, who think that this can never be a real performance problem. This despite the fact that the current thread is the third or fourth time this issue has come up in the last months...

vadim.pet...@gmail.com

unread,

Feb 11, 2014, 4:16:32 PM2/11/14

to std-pr...@isocpp.org, gonza...@gmail.com

boost::container::vector already has it.

http://www.boost.org/doc/libs/1_55_0/doc/html/container/extended_functionality.html#container.extended_functionality.default_initialialization

The feature was requested here several times but the local audience wasn't very kind to the requests.

gonza...@gmail.com

unread,

Feb 11, 2014, 7:36:13 PM2/11/14

to std-pr...@isocpp.org, gonza...@gmail.com

On Tuesday, February 11, 2014 6:32:00 PM UTC+1, Nevin ":-)" Liber wrote:

Do you have an actual use case where you need to have both uninitialized and initialized PODs and cannot use something like v.resize(100, V::value_type())?

Returning a vector type with some some _special_ semantics feels just wrong. I don't know what the users of my library are going to do with the vector but requiring them to know about default/value initialization seems to be an unnecessary burden.

Bengt Gustafsson

unread,

Feb 12, 2014, 3:36:48 AM2/12/14

to std-pr...@isocpp.org, gonza...@gmail.com

I don't see that your users would need to know anything special, it would be a standard vector, only when it gets filled from the C code library the implementor of the wrapper must know to do a resize_default_constructed() to avoid the overhead of

clearing the memory that the C function is soon to overwrite. The caller of the wrapper just sees a vector.

The beauty of resize_default_constructed() is that it allows this type of interfacing while NOT introducing any hazards if used on vectors of types with constructors and destructors.

Billy O'Neal

unread,

Feb 12, 2014, 1:00:47 PM2/12/14

to std-proposals, gonza...@gmail.com

I think much of the "hate" that showed up in the other thread talking about this was that in that thread, the proposal was about removing the length check in push_back, which has negligible cost in most situations.

> The beauty of resize_default_constructed() is that it allows this type of interfacing while NOT introducing any hazards if used on vectors of types with constructors and destructors.

If this means that you can call it without breaking vector's other invariants then that'd be a good candidate to add.

Billy O'Neal

https://github.com/BillyONeal/

http://stackoverflow.com/users/82320/billy-oneal

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Olaf van der Spek

unread,

Feb 12, 2014, 2:49:40 PM2/12/14

to std-pr...@isocpp.org, gonza...@gmail.com

On Tuesday, February 11, 2014 3:28:45 PM UTC+1, gonza...@gmail.com wrote:

What do you think?

Personally I'm using a shared_array variant for this.

shared_data wrapper_function() {

shared_data d(native_function_get_size());

native_function(d.data(), d.size(), ...);

return d;

}

The great thing is that the allocation/deallcation details aren't part of the type. I could for example also return a managed memory mapped file. Or a locked video surface, etc.

Bengt Gustafsson

unread,

Feb 12, 2014, 4:43:10 PM2/12/14

to std-pr...@isocpp.org, gonza...@gmail.com

Billy: I don't see that it would break any invariants. All elements are initialized by their default constructors, in the same way as local scalar variables are.

Olaf: I think you are referring to something similar to an array_view which was recently discussed here. This works fine for generic APIs (for instance in a range based for or to a templated algorithm) but most APIs are not generic (yet) but instead specify a specific type for their parameter, typically a vector<T> (with standard allocator).

Olaf van der Spek

unread,

Feb 21, 2014, 6:42:03 PM2/21/14

to std-pr...@isocpp.org, gonza...@gmail.com

On Wednesday, February 12, 2014 10:43:10 PM UTC+1, Bengt Gustafsson wrote:

Olaf: I think you are referring to something similar to an array_view which was recently discussed here. This works fine for generic APIs (for instance in a range based for or to a templated algorithm) but most APIs are not generic (yet) but instead specify a specific type for their parameter, typically a vector<T> (with standard allocator).

I think not, though I've got such a type too. shared_data really is like shared_ptr/array.

Reply all

Reply to author

Forward