Configurable Small Object Optimization for vector and string

Matthew Fioravante

unread,

Mar 3, 2015, 11:00:15 PM3/3/15

to std-pr...@isocpp.org

Modern implementations are encouraged to implement the small string optimization (with implementation defined size) for std::string, reasoning that most strings tend to be small. While this may work very well in the general case, it can be either inadequate or a pessimization in other cases.

It would be nice if string got an additional size_t template argument:


template <typename CharT, typename Traits, typename Allocator, size_t SmallSize = /* implementation defiend*/> class basic_string;

Implementations can make the default SmallSize whatever they like. Setting SmallSize to 0 would disable SSO, falling back to dynamic allocation for all strings.

Configuring the SOO size std::string has many uses cases:

- If you want to convert a large numeric value into a string and that numeric value has a compile time bounded range of values then a string with an SSO buffer large enough to store the largest possible representation without allocating memory is the most efficient string to return from parsing routines.

- A local variable string used to store an operating system path could benefit from using a much larger buffer size of a few kilobytes or more.

- Identifiers such a stock tickers which have a fixed upper bound on length.

Disabling SOO (by setting the size to 0), also has important use cases:

- When storing a std::string as a data member of the class and you know the string will either always be large -OR- the string data is not hot data and you want to keep it off the same cacheline as this. In this case SOO is a major pessimization as it wastes several bytes of precious cache line space.

As a further extension to solving the above issue. An implementation could optimize basic_string<T,Traits,Alloc,0> for size by only storing a single pointer in the basic_string object directly and storing the size, capacity, and the character data within the allocated buffer.

The small object optimization is also very useful for vector. A lot of C++ libraries have implemented a small_vector type which works exactly like std::vector except that it implements the SOO. Instead of implementing another clone of vector, we can add a size_t template argument to vector to also optionally enable SOO.

template <typename T, typename Allocator, size_t SmallSize = 0> class vector;

Thiago Macieira

unread,

Mar 4, 2015, 12:19:36 AM3/4/15

to std-pr...@isocpp.org

On Tuesday 03 March 2015 20:00:15 Matthew Fioravante wrote:
> Modern implementations are encouraged to implement the small string
> optimization (with implementation defined size) for std::string, reasoning
> that most strings tend to be small. While this may work very well in the
> general case, it can be either inadequate or a pessimization in other cases.
>
> It would be nice if string got an additional size_t template argument:
>
>
> template <typename CharT, typename Traits, typename Allocator, size_t
> SmallSize = /* implementation defiend*/> class basic_string;

Please choose a different name than basic_string.

Since std::foo<N> is a different type from std::foo<M> where N != M, I don't
see the point in attempting to keep compatibility with the current
basic_string. Instead, be explicit that this is a different type and define the
SmallSize default value to be sizeof(std::basic_string<CharT>).

> - Identifiers such a stock tickers which have a fixed upper bound on length.

If the fixed upper bound is small enough, you want to have a non-dynamic
version only. SSO isn't that. You want a real "small string".

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

Nevin Liber

unread,

Mar 4, 2015, 1:01:49 AM3/4/15

to std-pr...@isocpp.org

On 3 March 2015 at 22:00, Matthew Fioravante <fmatth...@gmail.com> wrote:

template <typename CharT, typename Traits, typename Allocator, size_t SmallSize = /* implementation defiend*/> class basic_string;

Adding a template parameter is an ABI breakage, and extremely unlikely to pass.

For instance, someone had a proposal in Urbana to add a template parameter to bitset. It went nowhere.

The small object optimization is also very useful for vector.

Besides the ABI breakage, this also breaks things like the nothrow swap guarantee (at least for std::allocator).

--

Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (847) 691-1404

Arthur O'Dwyer

unread,

Mar 4, 2015, 2:42:49 AM3/4/15

to std-pr...@isocpp.org

On Tuesday, March 3, 2015 at 9:19:36 PM UTC-8, Thiago Macieira wrote:

On Tuesday 03 March 2015 20:00:15 Matthew Fioravante wrote:
> Modern implementations are encouraged to implement the small string
> optimization (with implementation defined size) for std::string, reasoning
> that most strings tend to be small. While this may work very well in the
> general case, it can be either inadequate or a pessimization in other cases.

[...]

Since std::foo<N> is a different type from std::foo<M> where N != M, I don't
see the point in attempting to keep compatibility with the current
basic_string. Instead, be explicit that this is a different type and define the
SmallSize default value to be sizeof(std::basic_string<CharT>).

In my opinion, "small_vector" would be a great addition to the standard library, but only if small_vector<T,N>& and vector<T>& are inter-convertible (in constant time). I believe that's doable, although I'm not sure how to write the implementation in purely standard-conforming C++ — it might require some reinterpret_casts.

The use-case would be things like

struct ChessPosition {
    std::small_vector<ChessPiece,32> pieces; 
};

where ChessPosition's copy constructor and destructor get called very often, so copying the members needs to be very fast. Inlining the data (so no heap allocation needs to happen) is a big win.

But then do we just want ChessPiece pieces[32]? No, because (for the programmer's sanity) we need to track pieces.size() (which won't always be 32; usually it will be much less) separately from the fixed number 32.

But if we know a maximum size for the vector (32 items), do we really want a kind of bounded_vector<ChessPiece,32> which would behave basically like a std::array plus a size member? Well, that would definitely be useful in some cases, and I do wish there were a standard implementation of such a thing. But in the case of ChessPosition, we actually don't want that, because as noted above, pieces.size() will usually be much less than 32. We'd like to be able to tune the N parameter — for example, if we use small_vector<ChessPiece,16>, then early-game positions with more than 16 pieces will require a heap allocation (relatively slower), but every position will take half the RAM (shrinking our memory footprint and the expense of copy-constructing these things).

Some of these use-cases might be solved other ways. For example, bounded_vector could be described as "a vector_view onto an array."

my $.02,

–Arthur

Thiago Macieira

unread,

Mar 4, 2015, 2:58:41 AM3/4/15

to std-pr...@isocpp.org

On Tuesday 03 March 2015 23:42:49 Arthur O'Dwyer wrote:
> In my opinion, "small_vector" would be a great addition to the standard

> library, but *only* if small_vector<T,N>& and vector<T>& are

> inter-convertible (in constant time). I believe that's doable, although I'm

> not sure how to write the *implementation* in purely standard-conforming

> C++ — it might require some reinterpret_casts.

You can't create a copy in constant time, unless you're using reference
counting. You could move in constant time, provided that you could transfer
ownership of the data, but not in this case. The whole objective of the "small
vector" is that it has excslusive space for N elements, so if you moved from
one small_vector<T, N> to anything else, it needs to do a full copy in linear
time.

> The use-case would be things like
>
> struct ChessPosition {
> std::small_vector<ChessPiece,32> pieces;
> };
>

> where ChessPosition's copy constructor and destructor get called *very*
> often, so copying the members needs to be *very* fast. Inlining the data

> (so no heap allocation needs to happen) is a big win.

True, but this is still O(n).

> But if we know a maximum size for the vector (32 items), do we really want
> a kind of bounded_vector<ChessPiece,32> which would behave basically like a
> std::array plus a size member? Well, that would definitely be useful in
> some cases, and I do wish there were a standard implementation of such a
> thing. But in the case of ChessPosition, we actually don't want that,
> because as noted above, pieces.size() will usually be much less than 32.
> We'd like to be able to tune the N parameter — for example, if we use
> small_vector<ChessPiece,16>, then early-game positions with more than 16

> pieces will require a heap allocation (relatively slower), but *every*

> position will take half the RAM (shrinking our memory footprint and the
> expense of copy-constructing these things).
>
> Some of these use-cases might be solved other ways. For example,
> bounded_vector could be described as "a vector_view onto an array."

Makes a lot of sense.

David Rodríguez Ibeas

unread,

Mar 4, 2015, 10:31:03 AM3/4/15

to std-pr...@isocpp.org

I think that your problem is/should be addressed in a different direction, having an array_view/vector_view (I have heard about those in the past, I don't know if there has been any actual proposal, but it would be a simplified string_view). The 'ChessPosition' class looks like it should be an 'std::array', and if interfaces that only act on the data are implemented in terms of a *_view, then it should work just fine.

In current C++, and if you need a dynamic size (i.e. std::array does not solve your problem), you can use an allocator to control where the memory comes from and make it local, you would still be paying for the additional bookkeeping inside vector, but if you need it to actually work that is available today...

David

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

dgutson .

unread,

Mar 4, 2015, 10:39:18 AM3/4/15

to std-proposals

El 04/03/2015 12:31, "David Rodríguez Ibeas" <dib...@ieee.org> escribió:
>
> I think that your problem is/should be addressed in a different direction, having an array_view/vector_view (I have heard about those in the past, I don't know if there has been any actual proposal, but it would be a simplified string_view). The 'ChessPosition' class looks like it should be an 'std::array', and if interfaces that only act on the data are implemented in terms of a *_view, then it should work just fine.
>
> In current C++, and if you need a dynamic size (i.e. std::array does not solve your problem), you can use an allocator to control where the memory comes from and make it local, you would still be paying for the additional bookkeeping inside vector, but if you need it to actually work that is available today...

I had the same thought. Why an allocator having an initial pre-allocated buffer of user-specified length and then use basic_string with that allocator wouldn't be an equivalent currently conformant solution?

Matthew Fioravante

unread,

Mar 4, 2015, 12:38:37 PM3/4/15

to std-pr...@isocpp.org

On Wednesday, March 4, 2015 at 12:19:36 AM UTC-5, Thiago Macieira wrote:

> - Identifiers such a stock tickers which have a fixed upper bound on length.

If the fixed upper bound is small enough, you want to have a non-dynamic
version only. SSO isn't that. You want a real "small string".

I agree and this class would be great to have. Good luck getting such a beast approved by the standards committee though.

On Wednesday, March 4, 2015 at 10:31:03 AM UTC-5, David Rodríguez Ibeas wrote:

I think that your problem is/should be addressed in a different direction, having an array_view/vector_view (I have heard about those in the past, I don't know if there has been any actual proposal, but it would be a simplified string_view). The 'ChessPosition' class looks like it should be an 'std::array', and if interfaces that only act on the data are implemented in terms of a *_view, then it should work just fine.

Instead of using an external view object, I would suggest an adapter which contains the actual array and adds runtime size / compile time capacity semantics ontop of it.

The external view approach has a few problems. One is that you have 2 separate objects maintaining state invariants over a single collection of data. Anyone with access to the underlying std::array can easily violate your invariants. Also if you want to pass the aggregate data structure around, you have to pass 2 things instead of one which is a great idiom to use if you want to write bugs. Finally its also less efficient, particularly if you are storing the array and the view as data members of a class. The view will contain a redundant pointer to the array. Not only does this pointer waste valuable cache line bytes, its also an "internal pointer", that is a data member pointer which points to another data member of the class. The idiom of using internal pointers is often frowned upon because it prevents your type from be memcpyable.

On Wednesday, March 4, 2015 at 10:39:18 AM UTC-5, dgutson wrote:

I had the same thought. Why an allocator having an initial pre-allocated buffer of user-specified length and then use basic_string with that allocator wouldn't be an equivalent currently conformant solution?

You can optimize further by putting the SSO feature directly in the class. For example the SSO buffer and the capacity can share space using a union since whenever the SSO buffer is used, we implicitly know the capacity and don't need to store it. If you're storing a vector<small_vector<T,N>>, the space savings can add up.

Even if using an allocator is the chosen solution, a standard allocator to do this along with a small_vector / small_string template alias would be helpful.

Arthur O'Dwyer

unread,

Mar 4, 2015, 2:04:11 PM3/4/15

to std-pr...@isocpp.org

On Tue, Mar 3, 2015 at 11:58 PM, Thiago Macieira <thi...@macieira.org> wrote:

On Tuesday 03 March 2015 23:42:49 Arthur O'Dwyer wrote:
> In my opinion, "small_vector" would be a great addition to the standard
> library, but *only* if small_vector<T,N>& and vector<T>& are
> inter-convertible (in constant time). I believe that's doable, although I'm
> not sure how to write the *implementation* in purely standard-conforming
> C++ — it might require some reinterpret_casts.

You can't create a copy in constant time

I didn't say you should be able to create copies in constant time; I said that small_vector<T,N>& should be convertible to vector<T>& and vice versa.

void generic_func(const std::vector<ChessPiece>& pieces) { ... }

std::small_vector<ChessPiece, 16> mypieces;

generic_func(mypieces);

std::small_vector<ChessPiece, 18> hispieces;

generic_func(hispieces);

There's no (semantic) reason this code shouldn't work, since vector and small_vector have exactly the same API. We shouldn't need to make generic_func a template just to get this to work, and we should even be able to do it without virtuals.

- All vectors and small_vectors have size and capacity members; we just need to make sure they're at the same offsets.

- We might need a boolean flag to indicate whether the data is on the heap or inline; this is the part I'm unsure of.

- Since we know the size and capacity of the vector, we can even push_back, up to a point...

- ...and if we need to reallocate the data, then we have to be conservative and put it on the heap.

Matthew Fioravante wrote:

> For example the SSO buffer and the capacity can share space using a union since whenever

> the SSO buffer is used, we implicitly know the capacity and don't need to store it

I agree that this would be a useful goal, but I'm pretty sure it conflicts with my own pet goal to make std::small_vector<T,M>& interconvertible with std::small_vector<T,N>&, so you get to pick one or the other but not both. In my current design (which is not fully implemented) every kind of small_vector needs to know its "real" capacity; it can't trust its template parameter to reflect reality.

Also, consider:

#define is_SSOed(vec) ((uintptr_t)&vec <= (uintptr_t)vec.data() && (uintptr_t)vec.data() <= (uintptr_t)&vec + sizeof(vec));

std::small_vector<int, 4> vec = { 1, 2, 3 };

assert(vec.size() == 3 && vec.capacity() == 4 && is_SSOed(vec));

vec.push_back(0);

assert(vec.size() == 4 && vec.capacity() == 4 && is_SSOed(vec));

vec.push_back(0);

assert(vec.size() == 5 && vec.capacity() >= 5 && !is_SSOed(vec));

vec.pop_back(); // must not reallocate the vector

assert(vec.size() == 4 && vec.capacity() >= 5 && !is_SSOed(vec));

vec.pop_back(); // must not reallocate the vector

assert(vec.size() == 3 && vec.capacity() >= 5 && !is_SSOed(vec));

That is, (vec.size() <= decltype(vec)::SSO_capacity) does not necessarily imply is_SSOed(vec).

Notice that std::vector<T> could simply be an alias for small_vector<T,0> — that is, std::vector is exactly equivalent to a small_vector where the SSO buffer is big enough to store zero elements.

> The use-case would be things like
>
> struct ChessPosition {
> std::small_vector<ChessPiece,32> pieces;
> };
>
> where ChessPosition's copy constructor and destructor get called *very*
> often, so copying the members needs to be *very* fast. Inlining the data
> (so no heap allocation needs to happen) is a big win.

True, but this is still O(n).

Actually it's O(32), which is to say, O(1). The thing we're trying to optimize here is the constant factor: copying a std::vector of size 32 requires calling (the equivalent of) malloc followed by memcpy, whereas copying a small_vector requires calling (the equivalent of) memcpy alone.

> We'd like to be able to tune the N parameter — for example, if we use
> small_vector<ChessPiece,16>, then early-game positions with more than 16
> pieces will require a heap allocation (relatively slower), but *every*
> position will take half the RAM (shrinking our memory footprint and the
> expense of copy-constructing these things).

This is another example of trying to shrink the constant factor in that O(1). Using small_vector (in my opinion) shouldn't change the semantics of my program; it should merely make my program run faster.

–Arthur

Arthur O'Dwyer

unread,

Mar 4, 2015, 2:24:31 PM3/4/15

to std-pr...@isocpp.org

On Wed, Mar 4, 2015 at 7:39 AM, dgutson . <daniel...@gmail.com> wrote:

El 04/03/2015 12:31, "David Rodríguez Ibeas" <dib...@ieee.org> escribió:
>
> I think that your problem is/should be addressed in a different direction, having an array_view/vector_view (I have heard about those in the past, I don't know if there has been any actual proposal, but it would be a simplified string_view). The 'ChessPosition' class looks like it should be an 'std::array', and if interfaces that only act on the data are implemented in terms of a *_view, then it should work just fine.

Right; bounded_vector is exactly a vector_view onto an array. (Except for the sad fact that SGI STL's vector_view doesn't track "capacity", and so will buffer-overflow if you push too many things onto it, if I read the code correctly. But that would be fixable.)

Notice that SGI vector_view isn't exactly analogous to string_view, because string_view is immutable whereas vector_view allows you to modify elements, push, pop, and resize (but not reserve, because the vector_view doesn't know how to allocate memory).

> In current C++, and if you need a dynamic size (i.e. std::array does not solve your problem), you can use an allocator to control where the memory comes from and make it local, you would still be paying for the additional bookkeeping inside vector, but if you need it to actually work that is available today...

Oh, totally! All these things are "available today", if your definition of "available today" is "you have to write it yourself from scratch." :D I don't think anyone in this thread is claiming that the standard library MUST provide this functionality because it's IMPOSSIBLE to implement by oneself. People (myself included) are claiming that the standard library SHOULD provide this functionality because it's TEDIOUS and ERROR-PRONE to implement by oneself.

Also, unless I'm mistaken, C++14's std::vector doesn't make any guarantees about how much memory it's going to "allocate" via its allocator. In order for an allocator-based solution to work, we need a strong guarantee that std::vector<T>().resize(n) will make exactly one allocation of size n*sizeof(T). This is a good idea and must be coming soon anyway, but technically at the present moment there's no such guarantee, is there? /pedantry

–Arthur

Douglas Boffey

unread,

Mar 4, 2015, 3:17:29 PM3/4/15

to std-pr...@isocpp.org

O(1) time is still unachievable. The best you can hope for is
amortised constant time.

> O(1). Using small_vector (in my opinion) shouldn't change the *semantics*
> of my program; it should merely make my program run *faster*.

Matthew Fioravante

unread,

Mar 4, 2015, 3:27:25 PM3/4/15

to std-pr...@isocpp.org

On Wednesday, March 4, 2015 at 2:04:11 PM UTC-5, Arthur O'Dwyer wrote:

On Tue, Mar 3, 2015 at 11:58 PM, Thiago Macieira <thi...@macieira.org> wrote:
On Tuesday 03 March 2015 23:42:49 Arthur O'Dwyer wrote:
> In my opinion, "small_vector" would be a great addition to the standard
> library, but *only* if small_vector<T,N>& and vector<T>& are
> inter-convertible (in constant time). I believe that's doable, although I'm
> not sure how to write the *implementation* in purely standard-conforming
> C++ — it might require some reinterpret_casts.

You can't create a copy in constant time

I didn't say you should be able to create copies in constant time; I said that small_vector<T,N>& should be convertible to vector<T>& and vice versa.

void generic_func(const std::vector<ChessPiece>& pieces) { ... }

std::small_vector<ChessPiece, 16> mypieces;
generic_func(mypieces);
std::small_vector<ChessPiece, 18> hispieces;
generic_func(hispieces);

If you're only reading/writing the data and not modifying the container itself, you can use array_view.

For your approach to work we have to use inheritance and have small_vector inherit from vector. The only way this approach would acceptable is if std::vector can still operate the same way it always did with zero overhead because std::vector is the most important and most efficiency critical data structure in the entire standard library.

One big problem which probably kills this approach is resizing. If your vector pointer is pointing to the SSO buffer, how does the vector know not to try to free() this pointer after it moves the data to a new dynamically allocated buffer. Adding any additional state to std::vector breaks the zero overhead requirement I just mentioned.

If you really want this level of generality without templates or copy construction, then I believe the correct solution is a vector_view which can keep bools and any other state necessary to handle the SSO logic.

There's no (semantic) reason this code shouldn't work, since vector and small_vector have exactly the same API. We shouldn't need to make generic_func a template just to get this to work, and we should even be able to do it without virtuals.

virtuals are a no-go, std::vector cannot have an additional vptr bloating its size by sizeof(void*). d a flag. Just have the vector base class pointer point to the buffer when SSO is active. Adding a flag to std::vector is also the kind of unacceptable overhead I mentioned before.

- Since we know the size and capacity of the vector, we can even push_back, up to a point...
- ...and if we need to reallocate the data, then we have to be conservative and put it on the heap.

Matthew Fioravante wrote:
> For example the SSO buffer and the capacity can share space using a union since whenever
> the SSO buffer is used, we implicitly know the capacity and don't need to store it

I agree that this would be a useful goal, but I'm pretty sure it conflicts with my own pet goal to make std::small_vector<T,M>& interconvertible with std::small_vector<T,N>&, so you get to pick one or the other but not both. In my current design (which is not fully implemented) every kind of small_vector needs to know its "real" capacity; it can't trust its template parameter to reflect reality.

With the vector_view approach, implementations are not so constrained.

David Rodríguez Ibeas

unread,

Mar 4, 2015, 5:37:51 PM3/4/15

to std-pr...@isocpp.org

On Wed, Mar 4, 2015 at 2:24 PM Arthur O'Dwyer <arthur....@gmail.com> wrote:

On Wed, Mar 4, 2015 at 7:39 AM, dgutson . <daniel...@gmail.com> wrote:

El 04/03/2015 12:31, "David Rodríguez Ibeas" <dib...@ieee.org> escribió:Right; bounded_vector is exactly a vector_view onto an array. (Except for the sad fact that SGI STL's vector_view doesn't track "capacity", and so will buffer-overflow if you push too many things onto it, if I read the code correctly. But that would be fixable.)

I did not mean to have an interface that allows insertion, only a *view* (you can transverse the existing elements but not add/remove). This might not apply to your particular problem, but it is a simple way of representing a sequence of contiguous elements.

Notice that SGI vector_view isn't exactly analogous to string_view, because string_view is immutable whereas vector_view allows you to modify elements, push, pop, and resize (but not reserve, because the vector_view doesn't know how to allocate memory).

The problem with providing an interface for adding/removing elements ins that in some cases you might not be able to (say that the sequence is really an array), or the semantics may be different (vector can always reallocate and grow, you don't need to call 'capacity()' to determine whether you can push one more element).

Oh, totally! All these things are "available today", if your definition of "available today" is "you have to write it yourself from scratch." :D

https://github.com/bloomberg/bde

It is mostly a C++03 library implementation with all of the building blocks for the above. You can pull the 'bsl::allocator' template and the 'bslma::Allocator' protocol and implementation out and use that in your favorite C++11 library. In particular you might want to look at 'bslma::BufferSequentialAllocator' which takes a buffer and allocates out of it if possible, or falls back to the heap (really to another allocator, you can configure which). The work to get this running is not HUGE, it is actually quite small.

Also, unless I'm mistaken, C++14's std::vector doesn't make any guarantees about how much memory it's going to "allocate" via its allocator. In order for an allocator-based solution to work, we need a strong guarantee that std::vector<T>().resize(n) will make exactly one allocation of size n*sizeof(T). This is a good idea and must be coming soon anyway, but technically at the present moment there's no such guarantee, is there? /pedantry

Correct, you need to check your implementation. In my case I know the implementation and I know how the vector will grow, I can verify in test drivers how much memory is allocated.

Again, I don't think polymorphic allocators (so that the type of this vector is not different than any other vector) are for everyone, but they are a solution for some problems. Also, I am not even sure that you will get a huge advantage from using a local array versus dynamically allocated memory (the cost of allocation is becoming smaller and smaller), so using a plain std::vector with tcmalloc might give you most of the performance in most of the cases, but you should really measure.

David

Matthew Fioravante

unread,

Mar 4, 2015, 5:51:28 PM3/4/15

to std-pr...@isocpp.org, dib...@ieee.org

On Wednesday, March 4, 2015 at 5:37:51 PM UTC-5, David Rodríguez Ibeas wrote:

On Wed, Mar 4, 2015 at 2:24 PM Arthur O'Dwyer <arthur....@gmail.com> wrote:
On Wed, Mar 4, 2015 at 7:39 AM, dgutson . <daniel...@gmail.com> wrote:
El 04/03/2015 12:31, "David Rodríguez Ibeas" <dib...@ieee.org> escribió:Right; bounded_vector is exactly a vector_view onto an array. (Except for the sad fact that SGI STL's vector_view doesn't track "capacity", and so will buffer-overflow if you push too many things onto it, if I read the code correctly. But that would be fixable.)
I did not mean to have an interface that allows insertion, only a *view* (you can transverse the existing elements but not add/remove). This might not apply to your particular problem, but it is a simple way of representing a sequence of contiguous elements.

array_view<T> is what you want.

Also, I am not even sure that you will get a huge advantage from using a local array versus dynamically allocated memory (the cost of allocation is becoming smaller and smaller), so using a plain std::vector with tcmalloc might give you most of the performance in most of the cases, but you should really measure.

It really depends on your usage pattern. If you create the buffer once and reuse it multiple times, the allocation cost is probably low. If you're constantly creating new objects (such as using a "modern" API which returns strings by value) then you need the SSO the avoid multiple allocations.

Another use case is one I mentioned earlier:

vector<small_vector<T,N>>;

In this situation, all of the SSO buffers will be lined up linearly in one memory allocation block, providing optimal cache locality. If we can use union tricks to hide data members not needed when SSO is turned on, the cache locality improves further.

Douglas Boffey

unread,

Mar 5, 2015, 8:11:44 AM3/5/15

to std-pr...@isocpp.org

On Wed, Mar 4, 2015 at 8:17 PM, Douglas Boffey <douglas...@gmail.com> wrote:

O(1) time is still unachievable. The best you can hope for is
amortised constant time.

My mistake—I misread the thread.

Reply all

Reply to author

Forward