std::vector, std::string attach/detach external buffer

1,362 views
Skip to first unread message

Igor Baidiuk

unread,
Oct 20, 2017, 5:40:26 AM10/20/17
to ISO C++ Standard - Future Proposals

Hello,

What I regularly encounter is the need to interface with C libraries which often utilize their own memory management (at least have such interface) to manage dynamic objects constructed by them.
What I also regularly encounter is that standard library lacks tools for managing such chunks of memory. In fact, we only have std::unique_ptr<T[]> which IMO shouldn't have even supported arrays - because it provides no means to store their size. Although, we in fact already have containers in STL to handle this scenario - just need to teach them how to get initial buffer from external source.

Anyway, let's get to the point.
Some simpler containers, like std::vector and std::string, which are known to work on contiguous chunks of memory, can provide static member functions
  1. std::vector<T> std::vector::from_raw_array(T* pointer, size_t size, size_t capacity, Alloc allocator)
  2. std::basic_string<T> std::basic_string::from_c_string(T* pointer, Alloc allocator)
  3. std::basic_string<T> std::basic_string<T>::from_raw_string(T* pointer, size_t size, Alloc allocator)
  4. T* std::vector::detach_buffer() // detaches stored buffer and returns pointer to its start
  5. T* std::basic_string::detach_buffer() // detaches stored buffer and returns pointer to its start

This way, we won't need to reinvent the wheel of extern strings and extern arrays over and over again. Of course, it's up to user of such function to guarantee that pointer was allocated with provided allocator or compatible one.

Bo Persson

unread,
Oct 20, 2017, 10:55:08 AM10/20/17
to std-pr...@isocpp.org
On 2017-10-20 11:40, Igor Baidiuk wrote:
>
> Hello,
>
> What I regularly encounter is the need to interface with C libraries
> which often utilize their own memory management (at least have such
> interface) to manage dynamic objects constructed by them.
> What I also regularly encounter is that standard library lacks tools for
> managing such chunks of memory. In fact, we only have
> std::unique_ptr<T[]> which IMO shouldn't have even supported arrays -
> because it provides no means to store their size. Although, we in fact
> already have containers in STL to handle this scenario - just need to
> teach them how to get initial buffer from external source.
>
> Anyway, let's get to the point.
> Some simpler containers, like std::vector and std::string, which are
> known to work on contiguous chunks of memory, can provide static member
> functions
>
> 1. std::vector<T> std::vector::from_raw_array(T* pointer, size_t size,
> size_t capacity, Alloc allocator)
> 2. std::basic_string<T> std::basic_string::from_c_string(T* pointer,
> Alloc allocator)
> 3. std::basic_string<T> std::basic_string<T>::from_raw_string(T*
> pointer, size_t size, Alloc allocator)
> 4. T* std::vector::detach_buffer() // detaches stored buffer and
> returns pointer to its start
> 5. T* std::basic_string::detach_buffer() // detaches stored buffer and
> returns pointer to its start
>
> This way, we won't need to reinvent the wheel of extern strings and
> extern arrays over and over again. Of course, it's up to user of such
> function to guarantee that pointer was allocated with provided allocator
> or compatible one.
>

Unfortunately, the devil lives in the details.

If, for example, the basic_string has an allocator other than
std::allocator<char> it is no longer a std::string, but a totally
different type. That means that you cannot mix their use.

And if you get the raw buffer from a C function, it is likely allocated
with malloc. The std::allocator would use ::operator delete and not
free() to deallocate the buffer. Right now that is undefined behavior.

So, whatever you do there will still be incompatibility between the type
using internal and external memory.



Bo Persson


Igor Baidiuk

unread,
Oct 20, 2017, 11:16:14 AM10/20/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk
Sorry, but it seems you misinterpreted my idea a bit. I don't propose to make everything auto-compatible. String and vector can happily live with custom allocators. And with such custom allocators, it makes them different types.

My idea is to allow reuse of existing vector and string types, when buffer comes from external library. By this I mean that someone who wants this reuse would define custom allocator type atop external library API and then have types like
template<typename T>
using MyVector = std::vector<T, MyCLibAllocator>;

...

CLibType* pArray = nullptr;
size_t nArray
= 0;

CLib_obtainArray(&pArray, &nArray);
auto cLibVec = MyVector::from_raw_pointer(pArray, nArray, nArray);

Vishal Oza

unread,
Oct 20, 2017, 11:31:16 AM10/20/17
to ISO C++ Standard - Future Proposals
Would a use case example be the cxxabi.h found in gcc to get the name of object at runtime but the demangled name is dynamically allocated?

Nevin Liber

unread,
Oct 20, 2017, 11:33:55 AM10/20/17
to std-pr...@isocpp.org
On Fri, Oct 20, 2017 at 10:16 AM, Igor Baidiuk <targe...@gmail.com> wrote:
My idea is to allow reuse of existing vector and string types, when buffer comes from external library.

If a string comes from a different library, wrapping that in a std::basic_string won't work, because

s.push_back(0);

will not do the right thing.

Instead of the seemingly endless discussions of trying to shoehorn such a beast into vector/string, you may wish to consider proposing a separate managed immutable array-like type which manages a contiguous buffer and a length and gives std::array like access (begin(), end(), operator[], at(), data(), etc.) to its elements.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com>  +1-847-691-1404

Ville Voutilainen

unread,
Oct 20, 2017, 11:51:01 AM10/20/17
to ISO C++ Standard - Future Proposals
The committee has failed to standardize a mallocator despite that
being existing practice for years.
We can ask whether many users use the vendor-specific mallocators
until the cows come home, but
users won't use it until it's portable. Many users don't even know
that asking for multiple library vendors
to unify their mallocators might lead to any reasonable outcome; they
just take what they get, without
realizing they could perhaps convince library vendors to pick up this slack.

An immutable array-like type will not solve the problem of getting a
buffer and being able to use
that buffer with vector-like semantics without copying data or writing
a custom allocator. The committee
fails to add fairly reasonable allocators; it seems unreasonable to
expect users to write them by hand.

Nicol Bolas

unread,
Oct 20, 2017, 12:24:27 PM10/20/17
to ISO C++ Standard - Future Proposals


On Friday, October 20, 2017 at 11:33:55 AM UTC-4, Nevin ":-)" Liber wrote:
On Fri, Oct 20, 2017 at 10:16 AM, Igor Baidiuk <targe...@gmail.com> wrote:
My idea is to allow reuse of existing vector and string types, when buffer comes from external library.

If a string comes from a different library, wrapping that in a std::basic_string won't work, because

s.push_back(0);

will not do the right thing.

Um, why not? If it's given an allocator to allocate memory, and/or a capacity with which to expand the array, why would this not work?

Oh sure, if the "external library" can't handle a string with embedded NUL characters, that's one thing. But there are plenty of "external libraries" that can handle embedded NULs.

Igor Baidiuk

unread,
Oct 20, 2017, 12:33:55 PM10/20/17
to ISO C++ Standard - Future Proposals
Again. I'm not talking about any kind of allocator standartization/unification.

I'm talking that if we have
// clib_header.h

void* clib_allocate(size_t n);
void clib_deallocate(void*); // may be (void*, n), doesn't matter

struct clib_data { ... };
void clib_get_data(clib_data** p_data, size_t* n_data);

and if we write
// clib_allocator.h
#include "clib_header.h"

template<typename T>
struct CLibAlloc
{
   
using value_type = T;
    T
* allocate(size_t n)
   
{
       
return (T*)clib_allocate(n * sizeof(T));
   
}

   
void deallocate(T* ptr, size_t)
   
{
        clib_deallocate
(ptr);
   
}
};

we can do
clib_data* p_data = nullptr;
size_t
* n_data = 0;
clib_get_data
(&p_data, &n_data);
auto datavec = std::vector::from_raw_pointer(p_data, n_data, n_data, CLibAlloc<clib_data>());

Frankly I don't see how setting explicitly custom allocator for a vector and then simply initializing it with an already existing buffer created in fact by the same allocation API is not possible.

Ville Voutilainen

unread,
Oct 20, 2017, 12:39:05 PM10/20/17
to ISO C++ Standard - Future Proposals
On 20 October 2017 at 19:33, Igor Baidiuk <targe...@gmail.com> wrote:
> Again. I'm not talking about any kind of allocator
> standartization/unification.

I am, however, suggesting that what you want to achieve needs two things:
a) a mallocator, because that's how most C libraries allocate
b) a means to transfer buffer ownership into and out of a vector, assuming that
the incoming buffer was allocated by a compatible allocator, and that the user
can know how to deallocate the outgoing buffer.

I'm not saying that we should even consider a vector<T> (note, no
specific allocator specified)
to eat or regurgitate buffers that were not std::allocated. That will not work.
It's surely possible. Whether the committee can be convinced to enable
these use cases is another
matter. I think it's about time that such uses are enabled, because it
lowers the bar of adapting low-level
libraries that won't change to pure C++ into C++ code.

Nicol Bolas

unread,
Oct 20, 2017, 12:50:33 PM10/20/17
to ISO C++ Standard - Future Proposals
On Friday, October 20, 2017 at 12:39:05 PM UTC-4, Ville Voutilainen wrote:
On 20 October 2017 at 19:33, Igor Baidiuk <targe...@gmail.com> wrote:
> Again. I'm not talking about any kind of allocator
> standartization/unification.

I am, however, suggesting that what you want to achieve needs two things:
a) a mallocator, because that's how most C libraries allocate
b) a means to transfer buffer ownership into and out of a vector, assuming that
the incoming buffer was allocated by a compatible allocator, and that the user
can know how to deallocate the outgoing buffer.

I think his point is that, while both would be great, B is at least functional without A, since you can write your own mallocator easily enough. If you have A without B, you still can't transfer ownership of the buffer to `vector`.

Personally, I think we need a type that represents a buffer, its size, its capacity, and its allocator, somewhat similar to `map::node_type`. Like `node_type`, you cannot really interact with it; you can only pass it around to functions that consume it. And its no-throw move only. Unlike `node_type`, you can construct one from its components, rather than needing a `map` as a source for it.

The other issue is how to deal with `basic_string` and its ability to use small-string optimization. That is, if I have a 12-byte string and I initialize an SSO `string` from it, when does my buffer get deleted? The `basic_string` should not be required to keep my buffer around if the number of characters fits into the SSO buffer. So we'd have to say that the buffer could be deleted as recently as the call that consumes it.

Thiago Macieira

unread,
Oct 20, 2017, 4:12:23 PM10/20/17
to std-pr...@isocpp.org
On Friday, 20 October 2017 09:50:32 PDT Nicol Bolas wrote:
> The other issue is how to deal with `basic_string` and its ability to use
> small-string optimization. That is, if I have a 12-byte string and I
> initialize an SSO `string` from it, when does my buffer get deleted? The
> `basic_string` should not be required to keep my buffer around if the
> number of characters fits into the SSO buffer. So we'd have to say that the
> buffer could be deleted as recently as the call that consumes it.

Right. Once you transfer ownership to the std type, it *owns* that buffer and
has the ability to free it as soon as it wants to.

using cstring = std::basic_string<char, mallocator>;
auto str = cstring::from_raw_string(someBuf, someLen);

if someLen < SSO threshold, one of two things can happen:

1) cstring points to your buffer, despite not needing to
2) cstring copies someBuf onto itself and frees your buffer using
the mallocator

Note, however, the consequence:

char *ptr = cstring.detach_buffer();

may fail.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Thiago Macieira

unread,
Oct 20, 2017, 4:28:06 PM10/20/17
to std-pr...@isocpp.org
On Friday, 20 October 2017 02:40:25 PDT Igor Baidiuk wrote:
> Hello,
>
> What I regularly encounter is the need to interface with C libraries which
> often utilize their own memory management (at least have such interface) to
> manage dynamic objects constructed by them.
[cut]

Sharing experience from Qt:

Both QString and QByteArray classes have fromRawData(): those functions make
the object point to the buffer you provide, but they don't take ownership (I
didn't finish that feature for 5.0, I'll try again in 6.0). Note also that
these two functions as they exist in Qt 5 still allocate memory, since both
classes require a control block -- something I've already fixed and have ready
for 6.0.

The feature is missing from QVector, despite QVector's implementation being
nearly identical to the other two. In fact, all three classes share the same
control block and allocation functionality. I rejected this feature in the 5.0
development since we didn't see the need for it then and because it introduced
one conditional on deallocation.

Those functions allow you to point a QString or QByteArray to strings you had
elsewhere, without forcing a copy. So long as you only use const functions,
you can use all the API that those classes provide. This is especially useful
when operating on strings that exist inside larger buffers, like the contents
of a file, which may in turn be memory-mapped for efficiency. The same technique
can be applied to read-only char or char16_t arrays.

There are some caveats, though:

1) there's no detach_buffer() equivalent, since the internal construction of
QString, QByteArray and QVector do not permit so. The buffer is contiguous
with the control block, so the caller would need to know the pointer to the
control block, not the array, in order to deallocate.

2) There's a danger with NULs: normal QString and QByteArray, like
std::string, guarantee that the string will be null-terminated. But that's not
the case when using user-provided buffers, so care must be taken when those
strings are passed to code that in turn passes them to C functions without
explicit length, like the entire POSIX API. About a decade ago, I added
QByteArray::nulTerminated() to facilitate this, but never added an equivalent
to QString, which is technically needed for the Win32 API.

See also the discussion with Niall Douglas on null-terminated filenames.

3) There's an important issue with object lifetime and plugin unloading.
Because the Qt containers are reference-counted, a fromRawData string can
continue its life for some time. This is also possible with std:: containers
due to move semantics, even returning from functions. Over the past 5 years of
Qt 5 existence, we've run into this issue a few times, where a plugin or
library had been unloaded, but there were still QString objects pointing to
where that library used to be, causing crashes later on. And those are hard to
track even with Valgrind, since it can't point to what that memory was after
the plugin was unloaded.

Note this is no different from still having polymorphic objects of a class
type that was defined in a plugin or library that was unloaded. Virtual calls
will fail.

Our solution was actually a hammer: we don't unload plugins.

Nicol Bolas

unread,
Oct 20, 2017, 5:11:49 PM10/20/17
to ISO C++ Standard - Future Proposals


On Friday, October 20, 2017 at 4:12:23 PM UTC-4, Thiago Macieira wrote:
On Friday, 20 October 2017 09:50:32 PDT Nicol Bolas wrote:
> The other issue is how to deal with `basic_string` and its ability to use
> small-string optimization. That is, if I have a 12-byte string and I
> initialize an SSO `string` from it, when does my buffer get deleted? The
> `basic_string` should not be required to keep my buffer around if the
> number of characters fits into the SSO buffer. So we'd have to say that the
> buffer could be deleted as recently as the call that consumes it.

Right. Once you transfer ownership to the std type, it *owns* that buffer and
has the ability to free it as soon as it wants to.

        using cstring = std::basic_string<char, mallocator>;
        auto str = cstring::from_raw_string(someBuf, someLen);

if someLen < SSO threshold, one of two things can happen:

 1) cstring points to your buffer, despite not needing to
 2) cstring copies someBuf onto itself and frees your buffer using
    the mallocator

Note, however, the consequence:

        char *ptr = cstring.detach_buffer();

may fail.

... I disagree; this must succeed. But it doesn't have to empty the buffer, just like moving from a `basic_string` can still do a copy if the string is in the SSO buffer. This should be able to allocate memory, and therefore represents a potentially throwing operation.

This is another reason why I feel that the "detach_buffer" mechanism needs to be wrapped in a type. So instead of getting a `char*`, you get:

buffer<char, Allocator> buf = cstring.detach_buffer();

`buf` now owns that memory, with full RAII semantics. You can call `buf.data()` to get a pointer to the array, but it doesn't have any of the container machinery like `begin/end/operator[]/insert/remove/etc`. It's just a buffer with a size, capacity, and allocator.

Thiago Macieira

unread,
Oct 20, 2017, 5:44:39 PM10/20/17
to std-pr...@isocpp.org
On Friday, 20 October 2017 14:11:49 PDT Nicol Bolas wrote:
> > Note, however, the consequence:
> > char *ptr = cstring.detach_buffer();
> >
> > may fail.
>
> ... I disagree; this *must* succeed. But it doesn't have to *empty* the
> buffer, just like moving from a `basic_string` can still do a copy if the
> string is in the SSO buffer. This should be able to allocate memory, and
> therefore represents a potentially throwing operation.

Read: it may fail :-)

> This is another reason why I feel that the "detach_buffer" mechanism needs
> to be wrapped in a type. So instead of getting a `char*`, you get:
>
> buffer<char, Allocator> buf = cstring.detach_buffer();
>
> `buf` now owns that memory, with full RAII semantics. You can call
> `buf.data()` to get a pointer to the array, but it doesn't have any of the
> container machinery like `begin/end/operator[]/insert/remove/etc`. It's
> just a buffer with a size, capacity, and allocator.

I don't see how this helps with this particular problem. If the string was
SSO, then detach_buffer() may still throw, because it can't point to the
cstring object's area.

It may be a good idea as a general principle, though.
Reply all
Reply to author
Forward
0 new messages