[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

56 views
Skip to first unread message

Sean Silva via llvm-dev

unread,
Nov 13, 2020, 5:06:21 PM11/13/20
to llvm-dev, Duncan P. N. Exon Smith, Mehdi AMINI, Reid Kleckner
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.

A lot of this is boiled out from the discussion in https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1

The goals here are twofold:

1. convenience: not having to read/write "N", or do an extra edit/recompile cycle if you forgot it

2. avoiding pathological cases: The choice of N is usually semi-arbitrary in our experience, and if one isn't careful, can result in sizeof(SmallVector) becoming huge, especially in the case of nested SmallVectors. This patch avoids pathological cases in two ways:
  A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which prevents pathological size amplifications like in `SmallVector<struct {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply together. Of course, users can always write SmallVector<T, N> explicitly to bypass this, but the need for that seems rare.
  B. SmallVector<T, 0> feels "weird to write" for most folks, even though it is frequently the right choice. Vec<T> mitigates that by "looking natural".

I'm surfacing this as an RFC to get feedback on a couple higher-level points:
- does everybody agree that SVec<T> and Vec<T> are useful to have?
- get wider consensus around suggesting these as "defaults" (see my updates to ProgrammersManual.rst in the patch)
- how much we want to bulk migrate code vs let it organically grow. Replacing SmallVector<T, 0> with Vec<T> should be completely mechanical. Replacing SmallVector<T, N> for general N would be a lot more work.
- of course: naming. SVector/Vector were floated in the patch as well and seem ok. SmallVec was rejected as it was a prefix of SmallVector (messes up autocomplete).

Looking forward to a world with fewer guessed SmallVector sizes,

-- Sean Silva

David Blaikie via llvm-dev

unread,
Nov 16, 2020, 3:55:59 PM11/16/20
to Sean Silva, llvm-dev
I will say I'm not a huge fan of adding even more names for things in
this fairly core/common use case (now we'll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector<T> instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I think someone (JYKnight, perhaps) mentioned in the code review
(always difficult fragmenting the discussion between code review and
RFC, unfortunately - not sure there's a great solution to that - some
way to lock comments on a Phab review might be nice) that there are
cases where you do want a small inline buffer even when you're nested
inside another data structure and/or heap allocated (like tail
allocations).

Got any sense of the total value here? Major savings to be had?

(if SmallVector<T> can do what your proposed SVec<T> does, that leaves
the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
over std::vector<T>? I guess the SmallVectorImpl generic algorithm
opportunities? Though that's rarely needed compared to ArrayRef.)
If SmallVector<T> would suffice then maybe Vec<T> could be
ZeroSmallVector<T>? Not sure.

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi AMINI via llvm-dev

unread,
Nov 16, 2020, 4:55:54 PM11/16/20
to David Blaikie, llvm-dev
On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dbla...@gmail.com> wrote:
I will say I'm not a huge fan of adding even more names for things in
this fairly core/common use case (now we'll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector<T> instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code). To me the drawbacks are outweighing the benefits too much.
Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code. An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".

Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is because it is frequently preferable to `std::vector` but still isn't readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I think someone (JYKnight, perhaps) mentioned in the code review
(always difficult fragmenting the discussion between code review and
RFC, unfortunately - not sure there's a great solution to that - some
way to lock comments on a Phab review might be nice) that there are
cases where you do want a small inline buffer even when you're nested
inside another data structure and/or heap allocated (like tail
allocations).

Yes: I think it is misleading to formulate anything about heap, I see a SmallVector inside a heap allocated object is akin to a trailing allocation optimization.


Got any sense of the total value here? Major savings to be had?

(if SmallVector<T> can do what your proposed SVec<T> does, that leaves
the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
over std::vector<T>? I guess the SmallVectorImpl generic algorithm
opportunities? Though that's rarely needed compared to ArrayRef.)
If SmallVector<T> would suffice then maybe Vec<T> could be
ZeroSmallVector<T>? Not sure.

ZeroSmallVector<T> does not really address your "more vector names to pick from" concerns, and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the "default case" to be the easiest to reach / most intuitive to pick? `llvm::Vec` looks like "just a vector".

Cheers,

-- 
Mehdi

David Blaikie via llvm-dev

unread,
Nov 16, 2020, 5:12:47 PM11/16/20
to Mehdi AMINI, llvm-dev
On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joke...@gmail.com> wrote:
> On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dbla...@gmail.com> wrote:
>>
>> I will say I'm not a huge fan of adding even more names for things in
>> this fairly core/common use case (now we'll have even more vector
>> names to pick from) - can we use default template arguments so we can
>> write SmallVector<T> instead of SmallVector<T, N> and would that
>> address some of the use cases proposed here?
>
>
> I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

> To me the drawbacks are outweighing the benefits too much.
> Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

> An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)

> Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is because it is frequently preferable to `std::vector` but still isn't readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I'm not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the "Small" is slightly misleading (though not
entirely - it's important for the "SmallVectorImpl" benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl<T>& might also be confusing).

>> Got any sense of the total value here? Major savings to be had?
>>
>> (if SmallVector<T> can do what your proposed SVec<T> does, that leaves
>> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
>> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>> opportunities? Though that's rarely needed compared to ArrayRef.)
>> If SmallVector<T> would suffice then maybe Vec<T> could be
>> ZeroSmallVector<T>? Not sure.
>
>
> ZeroSmallVector<T> does not really address your "more vector names to pick from" concerns,

Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
have one fewer name and less convention to base the Vec<T> on (since
it won't have the SVec<T> sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

> and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the "default case" to be the easiest to reach / most intuitive to pick? `llvm::Vec` looks like "just a vector".

Somewhat - but Vec is an abbreviation that isn't really used otherwise
(if we consider SVec as well, I'd still push back on the introduction
of the abbreviation for both cases) and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).

I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)

Mehdi AMINI via llvm-dev

unread,
Nov 16, 2020, 5:44:21 PM11/16/20
to David Blaikie, llvm-dev
On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dbla...@gmail.com> wrote:
On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joke...@gmail.com> wrote:
> On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dbla...@gmail.com> wrote:
>>
>> I will say I'm not a huge fan of adding even more names for things in
>> this fairly core/common use case (now we'll have even more vector
>> names to pick from) - can we use default template arguments so we can
>> write SmallVector<T> instead of SmallVector<T, N> and would that
>> address some of the use cases proposed here?
>
>
> I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code `SmallVector<SomeType>` does not express that it is *intentional* to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly `0`, and as a reviewer (or code reader later) I don't know if this is was intentional or not. This is why I am quite opposed to "loosen" the current requirement on SmallVector: a different name for a different intent is better fitting to me.
 

>  To me the drawbacks are outweighing the benefits too much.
> Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is "easy to misuse", I don't see any advantage to it while it has drawbacks: why would we do that?
 

> An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that! This is why I don't like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently. 
 SVec does not accept a `N`: if I read SVec<Type> there can't be a possible accidental omission of N here. It has to be a choice between `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not `SmallVector<SomeType>`.
One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.
 
and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision
 

I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

David Blaikie via llvm-dev

unread,
Nov 16, 2020, 7:10:30 PM11/16/20
to Mehdi AMINI, llvm-dev
On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joke...@gmail.com> wrote:
>
>
>
> On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dbla...@gmail.com> wrote:
>>
>> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joke...@gmail.com> wrote:
>> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dbla...@gmail.com> wrote:
>> >>
>> >> I will say I'm not a huge fan of adding even more names for things in
>> >> this fairly core/common use case (now we'll have even more vector
>> >> names to pick from) - can we use default template arguments so we can
>> >> write SmallVector<T> instead of SmallVector<T, N> and would that
>> >> address some of the use cases proposed here?
>> >
>> >
>> > I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).
>>
>> I'm not quite following here - what sort of problems do you anticipate
>> by readers missing the default value for N?
>
>
> Reading code `SmallVector<SomeType>` does not express that it is *intentional* to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly `0`, and as a reviewer (or code reader later) I don't know if this is was intentional or not. This is why I am quite opposed to "loosen" the current requirement on SmallVector: a different name for a different intent is better fitting to me.

I don't know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.

As for whether it's intentional: I think if we're moving in this
direction that's proposed, the idea is that by default we don't want
to be explicit about the N - so in general we won't be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don't wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn't
required/intended unless it's added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.

>> > To me the drawbacks are outweighing the benefits too much.
>> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.
>>
>> Why would it be necessary to figure that out? If we generally feel the
>> right default inline storage happens to be zero in that case and that
>> SmallVector will pick a good default (including possibly 0) that seems
>> like a good thing to me. (why would we call out zero specially, if we
>> want to avoid calling out other values explicitly)
>
>
> I think this is an API that is "easy to misuse", I don't see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind? To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there's no need for us to think about
it on every use, we just let the default do what it's meant to do.

>> > An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".
>>
>> That quoted statement still sounds like it could encapsulate the
>> possibility of 0 too, though. "limited inline storage if possible"
>> could be "and the answer to that constraint is zero in this case - but
>> if you happen to make the T smaller, it could become non-zero
>> organically/without the need to touch this code" (as the N could vary
>> organically between non-zero values as struct sizes change too)
>
> Yes the quoted statement says exactly that!

Sorry, it seems we're jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

Two separate discussions/threads/patches may help keep the
communication more clear.

> This is why I don't like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
> SVec does not accept a `N`: if I read SVec<Type> there can't be a possible accidental omission of N here. It has to be a choice between `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not `SmallVector<SomeType>`.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it's worth considering/comparing this to other templates and their
default template arguments. Things like std::vector's customizable
allocators (you. don't have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don't wonder every time we see std::vector<T> whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it's
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec<T> the same way we
ask about other data structures today "did you mean SmallVector<T,
3>"?), unique_ptr's custom deleters, etc.

Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have "here's the default vector you shuold use instead of
std::vector for these reasons" situations).

>> I wouldn't entirely object to SmallVector<T> getting a smart default
>> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> don't feel /super/ great about it, but yeah, if we're going to push
>> the Programmer's Manual encouragement further in terms of
>> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> the worst way to do that.
>>
>> (it might be that this would benefit from being two separate
>> discussions, though - one on how to provide smart defaults for
>> SmallVector<T> and one on if/how to push std::vector deprecation in
>> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> spelled) further along)
>
>
> For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
> We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

- Dave

Chris Tetreault via llvm-dev

unread,
Nov 16, 2020, 7:39:02 PM11/16/20
to Sean Silva, Duncan P. N. Exon Smith, Mehdi AMINI, Reid Kleckner, LLVM Dev

I appreciate the work you’ve done on this. My impression is that the thought process that goes into picking the size parameter is something like “Uhh… Uhh… I guess 4 is good?”, so doing work to formalize this is good. I also think that people are likely reluctant to use `SmallVector<T, 0>`.  That said, I don’t think adding new types is useful.

 

I think `SmallVector` should just get a default size that is computed as `SVec`’s default size parameter is. Honestly, I’m unconcerned that anybody would “forget” to populate the size parameter. I really feel like any value that isn’t backed by hard data is unlikely to be better than the result of ` calculateSVecInlineElements<T>()`, and should be questioned in code review given the new heuristic you’ve added.

 

In short, I think that `SmallVector` should get a default size, and that the documentation should become something to the effect of “SmallVector should be preferred to std::vector unless you have a good reason to use the stl container. (api interop?). The default size for SmallVector has been chosen to be reasonable for most use cases. If you have evidence that shows the default is not optimal for your use case you can populate it with a more suitable value.”

 

Again, thanks for working on this. I’m looking forward to the day when I don’t have to pick a number out of my backside to put in that parameter.

 

Thanks,

   Christopher Tetreault

Mehdi AMINI via llvm-dev

unread,
Nov 16, 2020, 7:48:33 PM11/16/20
to David Blaikie, llvm-dev
I get your point, but I don't share the conclusion here: I don't believe that universally not specifying the N on non-trivial types is a good default. It only makes sense to me for small types, hence it is error prone.

 

>> >  To me the drawbacks are outweighing the benefits too much.
>> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.
>>
>> Why would it be necessary to figure that out? If we generally feel the
>> right default inline storage happens to be zero in that case and that
>> SmallVector will pick a good default (including possibly 0) that seems
>> like a good thing to me. (why would we call out zero specially, if we
>> want to avoid calling out other values explicitly)
>
>
> I think this is an API that is "easy to misuse", I don't see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind?

It does not convey the intent, it is too easy to misuse: i.e. you expect a small storage but you don't have one. This is enough to me to outweigh the benefits of the direction and prefer the status quo.

 
To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there's no need for us to think about
it on every use, we just let the default do what it's meant to do.

>> > An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".
>>
>> That quoted statement still sounds like it could encapsulate the
>> possibility of 0 too, though. "limited inline storage if possible"
>> could be "and the answer to that constraint is zero in this case - but
>> if you happen to make the T smaller, it could become non-zero
>> organically/without the need to touch this code" (as the N could vary
>> organically between non-zero values as struct sizes change too)
>
> Yes the quoted statement says exactly that!

Sorry, it seems we're jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

You're answering my quote about SVec, which is the type that is about implicit N, Vec (no S!) is about the N=0 case. The quote is also about the inline storage implicit N, it isn't clear to me how you link this to the 0 case (which does not have inline storage)?

 
 

Two separate discussions/threads/patches may help keep the
communication more clear.

>  This is why I don't like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
>  SVec does not accept a `N`: if I read SVec<Type> there can't be a possible accidental omission of N here. It has to be a choice between `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not `SmallVector<SomeType>`.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it's worth considering/comparing this to other templates and their
default template arguments. Things like std::vector's customizable
allocators (you. don't have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don't wonder every time we see std::vector<T> whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it's
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec<T> the same way we
ask about other data structures today "did you mean SmallVector<T,
3>"?), unique_ptr's custom deleters, etc.

I don't necessarily connect with the custom allocator / custom delete analogy right now, they seem too different to me.
I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and `std::unordered_map` which have different class names and aren't just differentiated with a trait passed as template argument.
I had looked into this when the revision was sent, but I couldn't find a library with a "vector with inlined storage" and a default N value. 
Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a default: https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
Folly has also another class, FBVector, which is really an alternative to std::vector with a different memory management strategy (growth factor, etc.). I don't think Abseil has anything else.

Do you have other ideas of related work to look for?

 

>> I wouldn't entirely object to SmallVector<T> getting a smart default
>> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> don't feel /super/ great about it, but yeah, if we're going to push
>> the Programmer's Manual encouragement further in terms of
>> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> the worst way to do that.
>>
>> (it might be that this would benefit from being two separate
>> discussions, though - one on how to provide smart defaults for
>> SmallVector<T> and one on if/how to push std::vector deprecation in
>> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> spelled) further along)
>
>
> For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
> We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

Likely: the review evolved because I opposed to changing the status quo in this direction I guess.

Best,

-- 
Mehdi

Sean Silva via llvm-dev

unread,
Nov 16, 2020, 9:14:36 PM11/16/20
to Mehdi AMINI, llvm-dev
Actually, the reason I originally pursued this was that I used a SmallDenseMap and happily found that it had a default number of inline elements: https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
I thought "why not SmallVector, since I use that a lot more often"? But based on historical misuse of SmallVector, using a hardcoded number like "4" (like SmallDenseMap does) as the default doesn't make much sense, so I pursued a bounded-sizeof approach.
 

 

>> I wouldn't entirely object to SmallVector<T> getting a smart default
>> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> don't feel /super/ great about it, but yeah, if we're going to push
>> the Programmer's Manual encouragement further in terms of
>> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> the worst way to do that.
>>
>> (it might be that this would benefit from being two separate
>> discussions, though - one on how to provide smart defaults for
>> SmallVector<T> and one on if/how to push std::vector deprecation in
>> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> spelled) further along)
>
>
> For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
> We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

Likely: the review evolved because I opposed to changing the status quo in this direction I guess.

It sounds like there is pretty good consensus here to just let `SmallVector<T>` "just work" and choose a default N. The one exception seems to be that Mehdi has a strong objection having the default N possibly be 0 (to ensure the bounded-sizeof property). So let's shift the discussion with a laser focus to specifically that point.

Responding to the arguments I've seen presented so far:
1. I'm not convinced that SmallVector<T> with defaulted N==0 is really error prone. Mehdi, I like your definition of "I am willing to pay for some limite inline storage if possible but I don't have a N in mind" and I think we can apply it to SmallVector<T>; I don't think adding SVec<T> is needed for that.
2. I don't understand the concern about "forgetting the N". In the "new world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be treated as "I profiled this code and this explicit N is the right choice". That is, adding an explicit `N` is going to be rare and deliberate (possibly with reviewers insisting on a comment!) -- this is not something that one can "forget".

For point 2., I think we will have some transition period where reviewers will organically start to propagate the new advice (and patch authors enjoying not having to write N), but I think the direction is effectively what I stated.

-- Sean Silva

Mehdi AMINI via llvm-dev

unread,
Nov 16, 2020, 9:45:34 PM11/16/20
to Sean Silva, llvm-dev
I understand your point, I still strongly disagree for the reason previously discussed: in particular I don't understand why applying it to SmallVector would be better than what we converged on with SVec.

Mehdi AMINI via llvm-dev

unread,
Nov 16, 2020, 9:53:19 PM11/16/20
to Sean Silva, llvm-dev
I'd add that we have SmallVector the way it is and it has been serving us pretty well so far: I'm not unhappy with the status quo for example. Changing the status quo deserves a stronger case than what I see here, in particular in light of the downside I see and also the alternative (SVec<T>).

 
2. I don't understand the concern about "forgetting the N". In the "new world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be treated as "I profiled this code and this explicit N is the right choice". That is, adding an explicit `N` is going to be rare and deliberate (possibly with reviewers insisting on a comment!) -- this is not something that one can "forget". 

That may be another aspect of our disconnect: you seem to propose a different approach to SmallVector where N is exceptional. I am seeing your addition as a convenience for a default case, that does not preclude from using local knowledge to be more specific. You may be right that on the long run we'd end up with a majority of unspecified N (I think I proposed in the revision llvm::Vector for this instead by the way), I just see SmallVector as something that is more of a conscientious choice at the moment.

Chris Lattner via llvm-dev

unread,
Nov 17, 2020, 10:26:37 AM11/17/20
to Sean Silva, llvm-dev
On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class.  You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two.  I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thoughts/suggestions:
 - Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

 - The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type.  If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase.  I don’t think that introducing a redundant name is a good thing (except to smooth the transition).

 - If people are concerned about the default being bad, then you could choose to make “SmallVector<T, -1>” be the thing that autosizes.  I tend to agree with your viewpoint that the N chosen is arbitrary in almost all cases anyway though, so I wouldn’t go this way.


Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.

These benefits are tiny, I really don’t think it is worth introducing a new name for this.  I think this is better handled with a change to CodingStandards (saying “don’t use std::vector”) and the programmers manual.

-Chris

Sean Silva via llvm-dev

unread,
Nov 17, 2020, 4:40:51 PM11/17/20
to Chris Lattner, llvm-dev
On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:
On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class.  You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two.  I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.
 

Thoughts/suggestions:
 - Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

-- Sean Silva

David Blaikie via llvm-dev

unread,
Nov 17, 2020, 4:42:32 PM11/17/20
to Chris Lattner, llvm-dev
On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884
>
> SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".
>
>
> Hey Sean,
>
> I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.
>
> My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.
>
> Thoughts/suggestions:
> - Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.

Out of curiosity: Why a single rather than zero?

> - The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).
>
> - If people are concerned about the default being bad, then you could choose to make “SmallVector<T, -1>” be the thing that autosizes. I tend to agree with your viewpoint that the N chosen is arbitrary in almost all cases anyway though, so I wouldn’t go this way.
>
>
> Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.
>
>
> These benefits are tiny, I really don’t think it is worth introducing a new name for this. I think this is better handled with a change to CodingStandards (saying “don’t use std::vector”) and the programmers manual.
>
> -Chris
>

Mehdi AMINI via llvm-dev

unread,
Nov 17, 2020, 5:15:33 PM11/17/20
to Sean Silva, llvm-dev
On Tue, Nov 17, 2020 at 1:40 PM Sean Silva <chiso...@gmail.com> wrote:


On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:
On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class.  You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two.  I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.
 

Thoughts/suggestions:
 - Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

Looks fine! Thanks

Duncan P. N. Exon Smith via llvm-dev

unread,
Nov 17, 2020, 5:16:16 PM11/17/20
to Sean Silva, Chris Lattner, LLVM Dev

On 2020 Nov  17, at 13:40, Sean Silva <chiso...@gmail.com> wrote:



On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:
On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class.  You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two.  I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.
 

Thoughts/suggestions:
 - Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

I'd be a bit concerned about this for two reasons:

- Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn't have them. If we're going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.

- The `static_assert` is going to fail differently on different platforms, since `T`'s size will tend to be platform-dependent for large `T`. I'm not sure it'll be predictable enough.

-- Sean Silva
 

 - The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type.  If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase.  I don’t think that introducing a redundant name is a good thing (except to smooth the transition).

I like the idea of a minor rename, although a bit different:
- SmallVectorBase => VectorBase
- SmallVectorImpl => VectorImpl (we could keep both around for a transition period)
- (etc.)
- SmallVector => Vector, but give `Vector` a default small size of 0
- Add SmallVector as a wrapper around Vector that has a nice default for the small size:
```
template <class T>
using SmallVector = Vector<T, DefaultInlinedElementSize<T>>;
```

Sean Silva via llvm-dev

unread,
Nov 17, 2020, 8:29:51 PM11/17/20
to Duncan P. N. Exon Smith, LLVM Dev
On Tue, Nov 17, 2020 at 2:16 PM Duncan P. N. Exon Smith <dexon...@apple.com> wrote:


On 2020 Nov  17, at 13:40, Sean Silva <chiso...@gmail.com> wrote:



On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:
On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class.  You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two.  I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.
 

Thoughts/suggestions:
 - Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

I'd be a bit concerned about this for two reasons:

- Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn't have them. If we're going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.

I would prefer 0, but I'm ok with 1 because SmallVector<SmallVector<T>> avoids the "exponential" effect of SmallVector<SmallVector<T, 4>, 2> == 8 inlined T's (+ headers and stuff).

As an experiment, I added in a static_assert(sizeof(T) < 512) into SmallVector. Almost all the errors were due to nested SmallVector<SmallVector<T, 4 or 8>, 4 or 8> and such. The only case I could see in the error log that was *not* due to nested SmallVector's was this legitimately enormous struct in llvm-objcopy: https://github.com/llvm/llvm-project/blob/67e0f791c93a23d0a523f3f05082c020f7c9109f/llvm/tools/llvm-objcopy/CopyConfig.h#L149 (which, btw, is used only in a SmallVector<T, 1>)

So I feel pretty confident that 1 inline element from SmallVector<LargeType> is probably less or equal to what users would choose. Thus, I still think 1 inline element minimum is net positive if Chris won't budge on lowering that to 0.

Chris, thoughts? 

 

- The `static_assert` is going to fail differently on different platforms, since `T`'s size will tend to be platform-dependent for large `T`. I'm not sure it'll be predictable enough.

Agreed.
 

-- Sean Silva
 

 - The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type.  If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase.  I don’t think that introducing a redundant name is a good thing (except to smooth the transition).

I like the idea of a minor rename, although a bit different:
- SmallVectorBase => VectorBase
- SmallVectorImpl => VectorImpl (we could keep both around for a transition period)
- (etc.)
- SmallVector => Vector, but give `Vector` a default small size of 0
- Add SmallVector as a wrapper around Vector that has a nice default for the small size:
```
template <class T>
using SmallVector = Vector<T, DefaultInlinedElementSize<T>>;
```

I'm generally +1 on making SmallVector<T, 0> be easier to use, but it seems that the discussion of default N is already complex enough, so for now let's focus only on the default N situation.

-- Sean Silva

Sean Silva via llvm-dev

unread,
Nov 25, 2020, 6:47:42 PM11/25/20
to Duncan P. N. Exon Smith, LLVM Dev
Ping.

It sounds like we left off with consensus around just making SmallVector<T> have a default N.

We were converging to a default N policy of:
- at least 1 inlined element
- sizeof(SmallVector<T>) <= 64 when it doesn't contradict the "at least 1 inlined element".

Any objections to moving forward with that?

-- Sean Silva

David Blaikie via llvm-dev

unread,
Nov 25, 2020, 6:52:09 PM11/25/20
to Sean Silva, LLVM Dev
On Wed, Nov 25, 2020 at 3:47 PM Sean Silva via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Ping.
>
> It sounds like we left off with consensus around just making SmallVector<T> have a default N.
>
> We were converging to a default N policy of:
> - at least 1 inlined element

I'm still not sure why "at least 1" is an important goal. I'd be
curious to hear from Chris/others why that's especially valuable.

Michael Kruse via llvm-dev

unread,
Nov 26, 2020, 1:56:18 AM11/26/20
to David Blaikie, LLVM Dev
Am Mi., 25. Nov. 2020 um 17:52 Uhr schrieb David Blaikie via llvm-dev
<llvm...@lists.llvm.org>:

> I'm still not sure why "at least 1" is an important goal. I'd be
> curious to hear from Chris/others why that's especially valuable.

If N is 0, then it's not a small-size optimized vector, in contrast to
what the name implies.

Actually, if the intention is to have no inlined elements, I think one
should use std:vector. Generally, projects should not reinvent STL
containers and prefer the standard library one unless there is a good
enough reason not to use them.


Michael

Nicolai Hähnle via llvm-dev

unread,
Nov 26, 2020, 2:42:24 AM11/26/20
to Michael Kruse, LLVM Dev
On Thu, Nov 26, 2020 at 7:56 AM Michael Kruse via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Am Mi., 25. Nov. 2020 um 17:52 Uhr schrieb David Blaikie via llvm-dev
> <llvm...@lists.llvm.org>:
> > I'm still not sure why "at least 1" is an important goal. I'd be
> > curious to hear from Chris/others why that's especially valuable.
>
> If N is 0, then it's not a small-size optimized vector, in contrast to
> what the name implies.
>
> Actually, if the intention is to have no inlined elements, I think one
> should use std:vector. Generally, projects should not reinvent STL
> containers and prefer the standard library one unless there is a good
> enough reason not to use them.

I agree for regular code, but SmallVector can be and is used in
templates as SmallVector<T, N> where T is not known to the code
writer. I'm also curious what makes the number 1 so special.

Cheers,
Nicolai

--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Chris Lattner via llvm-dev

unread,
Nov 27, 2020, 11:45:53 PM11/27/20
to David Blaikie, llvm-dev
Sorry for falling off the map on this thread:

On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:
Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

Out of curiosity: Why a single rather than zero?

My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created.  While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial.  I don’t see why we’d recommend SmallVector<T,0> over vector to get those.  If we were in favor of banning std::vector, then I think the reason would be for consistency across the codebase and to get things like pop_back_val().

One other way to handle this is to make there be a static_assert for the case where the size is inferred to zero.

On Nov 25, 2020, at 10:55 PM, Michael Kruse via llvm-dev <llvm...@lists.llvm.org> wrote:
Am Mi., 25. Nov. 2020 um 17:52 Uhr schrieb David Blaikie via llvm-dev
<llvm...@lists.llvm.org>:
I'm still not sure why "at least 1" is an important goal. I'd be
curious to hear from Chris/others why that's especially valuable.

If N is 0, then it's not a small-size optimized vector, in contrast to
what the name implies.

Right, exactly.

On Nov 17, 2020, at 2:15 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:
I'd be a bit concerned about this for two reasons:

- Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn't have them. If we're going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.

I don’t understand your rationale here: the issue in question comes from when T is large not when the count is large (since we’re talking about the inferred count case).

- The `static_assert` is going to fail differently on different platforms, since `T`'s size will tend to be platform-dependent for large `T`. I'm not sure it'll be predictable enough.

I agree this is a problem.  One way to do that is to make the assert only trip on sizeof(void*)==8 hosts or something to make it more “incomplete but (more) stable”.

-Chris

David Blaikie via llvm-dev

unread,
Nov 30, 2020, 2:01:58 PM11/30/20
to Chris Lattner, llvm-dev
On Fri, Nov 27, 2020 at 8:45 PM Chris Lattner <clat...@nondot.org> wrote:
Sorry for falling off the map on this thread:

On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:
Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

Out of curiosity: Why a single rather than zero?

My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created.  While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial.  I don’t see why we’d recommend SmallVector<T,0> over vector to get those.

Fair - though we do recommend it: https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
"SmallVector has grown a few other minor advantages over std::vector, causing SmallVector<Type, 0> to be preferred over std::vector<Type>."

Happy if that decision's being revisited.

But even then, there might be some benefit (in addition to the other benefits mentioned in the Programmers Manual - though I'm not fully on board with the "never use std::vector" idea... undecided there) to not having to churn the name of a type (& possibly the subsequent uses of SmallVectorImpl, etc) when a struct changes size tipping it over the limit for SmallVector?
 

Duncan P. N. Exon Smith via llvm-dev

unread,
Nov 30, 2020, 8:35:02 PM11/30/20
to Michael Kruse, LLVM Dev

> On 2020 Nov 25, at 22:55, Michael Kruse via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Am Mi., 25. Nov. 2020 um 17:52 Uhr schrieb David Blaikie via llvm-dev
> <llvm...@lists.llvm.org>:
>> I'm still not sure why "at least 1" is an important goal. I'd be
>> curious to hear from Chris/others why that's especially valuable.
>
> If N is 0, then it's not a small-size optimized vector, in contrast to
> what the name implies.
>
> Actually, if the intention is to have no inlined elements, I think one
> should use std:vector. Generally, projects should not reinvent STL
> containers and prefer the standard library one unless there is a good
> enough reason not to use them.

SmallVector<T, 0> has a number of benefits that make it better in practice than std::vector, at least for LLVM code:
- Interoperates with the pervasively used SmallVectorImpl<T>.
- No exception guarantees, and thus none of the related, harmful pessimizations (std::vector::resize will do expensive copies instead of cheap moves in some cases, last I checked).
- Customizations for using memcpy more often.
- Smaller by a pointer for most `T` (64-bit pointers).

Agreed that the name is wrong and not ergonomic, but we should not be using std::vector pretty much ever IMO.

Duncan P. N. Exon Smith via llvm-dev

unread,
Nov 30, 2020, 8:44:43 PM11/30/20
to Chris Lattner, LLVM Dev

On 2020 Nov  27, at 20:45, Chris Lattner via llvm-dev <llvm...@lists.llvm.org> wrote:

Sorry for falling off the map on this thread:

On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:
Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

Out of curiosity: Why a single rather than zero?

My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created.  While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial.

The performance benefits aren't trivial.

std::vector grow operations will refuse to use std::move for some T, a pessimization required by its exception guarantees, even if you're building with `-fno-exceptions`. We had a massive compile-time problem in 2016 related to this that I fixed with 3c406c2da52302eb5cced431373f240b9c037841 by switching to SmallVector<T,0>. You can see the history in r338071 / 0f81faed05c3c7c1fbaf6af402411c99d715cf56.

James Y Knight via llvm-dev

unread,
Nov 30, 2020, 8:57:20 PM11/30/20
to Duncan P. N. Exon Smith, LLVM Dev
That issue, at least, is fixable without switching from std::vector just by adding noexcept to the appropriate user-defined move constructors.

Sean Silva via llvm-dev

unread,
Nov 30, 2020, 9:05:06 PM11/30/20
to Chris Lattner, llvm-dev
I'm okay with either 1 or 0 inlined elements for huge value types.

I can buy the argument both ways. It basically comes down to the invariants we want to provide:

- invariant for "minimum 0 inlined elements" is "sizeof(SmallVector<T>) <= 64"
  - this in theory provides stronger invariants related to excessive memory usage from the inlined elements
- invariant for "minimum 1 inlined elements" is "at least one inlined element, possibly more if we can fit it in 64 bytes"
  - this avoids confusion of SmallVector<T> not having inlined elements

I don't think either choice really boxes us out of any future change or even matters much. So why don't we all rally around the 1 inlined element minimum case? Given the name "SmallVector" (which is a bad name, but folks will read it as "vector with inlined elements") it seems least surprising for now.

And in practice huge value types are very rare, so honestly I don't think that the invariant provided by "minimum 0" really buys us much in practice when viewed holistically. (also, both cases prevent SmallVector<SmallVector<T>> from multiplicatively exploding in size which makes me happy).

I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".

There's a larger discussion to be had here which I don't want to block this change on: Some folks (including me) believe that SmallVector has drifted from "used for small number of elements" and is in practice more like "LLVM's better vector" (including using 0 inline elements to replace std::vector) and the latter interpretation makes the "minimum 0" case make more sense. But we haven't codified the "use SmallVector / a new llvm::Vector unless you know you need std::vector" policy, and once we do then switching to the "minimum 0" case is pretty trivial. I'm very happy to have that discussion, but it seems incongruous to block the "default N" change on having that discussion.

-- Sean Silva

On Fri, Nov 27, 2020 at 8:45 PM Chris Lattner <clat...@nondot.org> wrote:

Duncan Exon Smith via llvm-dev

unread,
Nov 30, 2020, 11:30:20 PM11/30/20
to James Y Knight, LLVM Dev


On Nov 30, 2020, at 17:57, James Y Knight <jykn...@google.com> wrote:



Sure, once we’ve added noexcept to all types in LLVM/Clang/etc. That’s a pretty long tail though; a lot of work for relatively little gain given that we don’t care about exceptions anyway and we have an optimized vector implementation in tree. 

Michael Kruse via llvm-dev

unread,
Dec 1, 2020, 12:02:52 AM12/1/20
to Duncan P. N. Exon Smith, LLVM Dev
Am Mo., 30. Nov. 2020 um 19:34 Uhr schrieb Duncan P. N. Exon Smith
<dexon...@apple.com>:r<T, 0> has a number of benefits that make it

better in practice than std::vector, at least for LLVM code:
> - Interoperates with the pervasively used SmallVectorImpl<T>.
> - No exception guarantees, and thus none of the related, harmful pessimizations (std::vector::resize will do expensive copies instead of cheap moves in some cases, last I checked).
> - Customizations for using memcpy more often.
> - Smaller by a pointer for most `T` (64-bit pointers).

The LLVM project also hosts libc++, shouldn't we dogfood our own
implementations?

2 and 3 should be optimizations possible in the STL. I disagree that
SmallVectorImpl is pervasively used. The normal use case is a function
parameter where the caller creates a SmallVector just to pass it, and
can continue to do so with zero inline elements even if in other
places we use std::vector.

Michael

Mehdi AMINI via llvm-dev

unread,
Dec 1, 2020, 1:01:07 PM12/1/20
to Michael Kruse, LLVM Dev
On Mon, Nov 30, 2020 at 9:02 PM Michael Kruse via llvm-dev <llvm...@lists.llvm.org> wrote:
Am Mo., 30. Nov. 2020 um 19:34 Uhr schrieb Duncan P. N. Exon Smith
<dexon...@apple.com>:r<T, 0> has a number of benefits that make it
better in practice than std::vector, at least for LLVM code:
> - Interoperates with the pervasively used SmallVectorImpl<T>.
> - No exception guarantees, and thus none of the related, harmful pessimizations (std::vector::resize will do expensive copies instead of cheap moves in some cases, last I checked).
> - Customizations for using memcpy more often.
> - Smaller by a pointer for most `T` (64-bit pointers).

The LLVM project also hosts libc++, shouldn't we dogfood our own
implementations?

std::vector is limited by what the standard allows though. In particular in terms of exception safety (IIRC you can't use -fnoexception to change the behavior like moving instead of copying in general).
 

2 and 3 should be optimizations possible in the STL. I disagree that
SmallVectorImpl is pervasively used.

Any method that accepts a container and will do something like "push_back" on it has to either take SmallVectorImpl or be templated to accept std::vector and SmallVector.

Robinson, Paul via llvm-dev

unread,
Dec 1, 2020, 3:30:57 PM12/1/20
to Mehdi AMINI, Michael Kruse, llvm...@lists.llvm.org
>> 2 and 3 should be optimizations possible in the STL. I disagree that
>> SmallVectorImpl is pervasively used.
>
> Any method that accepts a container and will do something like "push_back"
> on it has to either take SmallVectorImpl or be templated to accept std::vector
> and SmallVector.

FTR, grepping llvm/include for SmallVectorImpl and filtering out ADT (because
that's going to be full of the implementation details) got 632 hits.

It's at least "widely used" if not "pervasively used."
--paulr

Chris Lattner via llvm-dev

unread,
Dec 1, 2020, 5:19:36 PM12/1/20
to Duncan Exon Smith, LLVM Dev
On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:
Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default.  I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though.  Perhaps generate a static_assert when it is crazy large.

Out of curiosity: Why a single rather than zero?

My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created.  While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial.

The performance benefits aren't trivial.

std::vector grow operations will refuse to use std::move for some T, a pessimization required by its exception guarantees, even if you're building with `-fno-exceptions`. We had a massive compile-time problem in 2016 related to this that I fixed with 3c406c2da52302eb5cced431373f240b9c037841 by switching to SmallVector<T,0>. You can see the history in r338071 / 0f81faed05c3c7c1fbaf6af402411c99d715cf56.

That issue, at least, is fixable without switching from std::vector just by adding noexcept to the appropriate user-defined move constructors.

Sure, once we’ve added noexcept to all types in LLVM/Clang/etc. That’s a pretty long tail though; a lot of work for relatively little gain given that we don’t care about exceptions anyway and we have an optimized vector implementation in tree. 

Can you spell this out for me?  Why do we need noexcept if we’re building with -fno-exceptions?  What is going on here?

-Chris

Chris Lattner via llvm-dev

unread,
Dec 1, 2020, 5:33:13 PM12/1/20
to Sean Silva, llvm-dev
I feel that this thread has three different discussions going on, which should be separated out:

1) Should we forbid std::vector and use SmallVector<T,0> always?
2) What should the default argument of N be?
3) What should these types be called?

Unfortunately, these are all interrelated.

On Nov 30, 2020, at 6:04 PM, Sean Silva <chiso...@gmail.com> wrote:
I'm okay with either 1 or 0 inlined elements for huge value types.

I can buy the argument both ways. It basically comes down to the invariants we want to provide:

- invariant for "minimum 0 inlined elements" is "sizeof(SmallVector<T>) <= 64"
  - this in theory provides stronger invariants related to excessive memory usage from the inlined elements
- invariant for "minimum 1 inlined elements" is "at least one inlined element, possibly more if we can fit it in 64 bytes"
  - this avoids confusion of SmallVector<T> not having inlined elements

I think there are two different consistent design points depending on naming:

1) In a design like today where we have two names for “inlined vector” and “not inlined vector”, then I think it is important for “InlinedVector<T>” to have at least 1 element.  Defaulting to out of line when using the inline vector name betrays intention: it would be better to generate a compile time error or warning if the element size is too long, because the compiler engineer should use the out of line name.

2) In a design where we have "one name", then it makes more sense to have the default argument type be the “do what I mean” size, which defaults to zero if T is too large.


I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.

I don't think either choice really boxes us out of any future change or even matters much. So why don't we all rally around the 1 inlined element minimum case? Given the name "SmallVector" (which is a bad name, but folks will read it as "vector with inlined elements") it seems least surprising for now.

And in practice huge value types are very rare, so honestly I don't think that the invariant provided by "minimum 0" really buys us much in practice when viewed holistically. (also, both cases prevent SmallVector<SmallVector<T>> from multiplicatively exploding in size which makes me happy).

I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".

Yes, I agree with this.

There's a larger discussion to be had here which I don't want to block this change on: Some folks (including me) believe that SmallVector has drifted from "used for small number of elements" and is in practice more like "LLVM's better vector" (including using 0 inline elements to replace std::vector) and the latter interpretation makes the "minimum 0" case make more sense. But we haven't codified the "use SmallVector / a new llvm::Vector unless you know you need std::vector" policy, and once we do then switching to the "minimum 0" case is pretty trivial. I'm very happy to have that discussion, but it seems incongruous to block the "default N" change on having that discussion.

I have found this discussion to be useful and interesting, I hadn’t thought about it this much at least.

Overall, I’d recommend a path like this:

1) We decide what to do about the default argument.  I agree with Sean that SmallVector<T> should default to 1 at the minimum, and produce an error or warning of T.  This makes sense given the bias towards a name that implies inline semantics.

2) We decide whether we want to ban std::vector in the LLVM code base by convention.  If so, I think that we should have a *separate* name for the out of line case, e.g. llvm::Vector<T>, which would be a good dual to llvm::SmallVector<T>.  If this is the end point (Duncan seems to think it should be) then we should make this part of the coding standard and move the code base towards it over time.

3) Finally, we decide if we want to rename SmallVector to something like InlineVector or IVector.

-Chris

Mehdi AMINI via llvm-dev

unread,
Dec 1, 2020, 5:40:18 PM12/1/20
to Chris Lattner, LLVM Dev
It isn't clear to me that -fno-exceptions can get the same benefit as noexcept for code that is written with exceptions in mind (I think https://stackoverflow.com/questions/61417534/can-stdvectort-use-ts-move-constructor-if-exceptions-are-disabled explains some of it).

While it is possible to write a library that optimizes for -fno-exceptions, I am not sure it would be standard-compliant for the STL to do so (and it'd be anyway another code path that the one written in term of `std::move_if_noexcept`).

Does it make sense or did you see it differently with your question?

-- 
Mehdi

Mehdi AMINI via llvm-dev

unread,
Dec 1, 2020, 5:53:05 PM12/1/20
to Chris Lattner, LLVM Dev
After digging a bit, I found that libcxx was changed last year to optimize with moving in std::vector when -fno-exceptions is present : https://reviews.llvm.org/D62228

That means that a copy-only class can't be used in vector anymore in clang-10 but it was possible in clang-9 (but only with -fno-exceptions): https://godbolt.org/z/M54bqn

-- 
Mehdi

Duncan P. N. Exon Smith via llvm-dev

unread,
Dec 1, 2020, 7:07:31 PM12/1/20
to Chris Lattner, LLVM Dev
+richard
Sure. It's a bit convoluted. Here's my understanding:

First, here's why std::vector has this behaviour:
- std::vector grow operations need to transfer their existing elements over to the new storage.
- The grow operations are usually required to meet the "strong exception guarantee": if something throws, this function has no effect.
- If move operations throw, you can't provide this guarantee unless you copy (you can't move back the elements that have been half-moved over, in case another exception is thrown; but if it was just a copy, the original storage still has the elements safely unmodified).
- There's a caveat / carve out, that if T cannot be copy-constructed AND T's move constructor is not noexcept, then the guarantee is waived (since there's no way to implement it).
- Implementation is to call std::move_if_noexcept (https://en.cppreference.com/w/cpp/utility/move_if_noexcept), which moves if it's a noexcept operation, or if T is not copy-constructible.

Second, here's why the behaviour doesn't change when -fno-exceptions:
- -fno-exceptions does NOT imply `noexcept` (maybe it should?, but it doesn't).
- This is implemented by detecting via SFINAE whether something is `noexcept` (maybe std::vector::resize/push_back/etc should have a special case? but that's controversial).

IMO, until all the C++ standard libraries and host compilers that we support being built with will consistently use std::move on grow operations in std::vector in -fno-exceptions mode, we should only use std::vector when we absolutely have to. It's not designed for -fno-exceptions codebases.

(
There's some discussion in this thread:

And Richard had a proposal that made sense to me:
I'm wondering if we should have an experimental
option to specify that functions are noexcept by default (overridable
by an explicit exception specification)

But that only fixes it in host compilers that implemented this experimental mode.
)

Duncan P. N. Exon Smith via llvm-dev

unread,
Dec 1, 2020, 7:13:55 PM12/1/20
to Mehdi Amini, LLVM Dev
Oh great, I didn't think this landed!

IMO, once all host toolchains we support have the same behaviour it would be safe to start using std::vector again (doing it sooner could give wildly divergent performance characteristics depending on what LLVM was built with).

Duncan P. N. Exon Smith via llvm-dev

unread,
Dec 1, 2020, 7:41:56 PM12/1/20
to Chris Lattner, Sean Silva, LLVM Dev
On Nov 30, 2020, at 6:04 PM, Sean Silva <chiso...@gmail.com> wrote:

I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".

I'm fine with this as well.

On 2020 Dec  1, at 14:33, Chris Lattner via llvm-dev <llvm...@lists.llvm.org> wrote:

Overall, I’d recommend a path like this:

This path SGTM.

1) We decide what to do about the default argument.  I agree with Sean that SmallVector<T> should default to 1 at the minimum, and produce an error or warning of T.  This makes sense given the bias towards a name that implies inline semantics.

As mentioned above, I think minimum 1 makes sense.

2) We decide whether we want to ban std::vector in the LLVM code base by convention.  If so, I think that we should have a *separate* name for the out of line case, e.g. llvm::Vector<T>, which would be a good dual to llvm::SmallVector<T>.  If this is the end point (Duncan seems to think it should be) then we should make this part of the coding standard and move the code base towards it over time.

(Yeah, I don't think we should allow use of std::vector; even though Mehdi discovered libc++ was changed, it'll be super frustrating to track down compile-time regressions that only occur on the non-libc++ bots or something; I don't think the problem goes away until we stop supporting being built with standard libraries without this)

James Y Knight via llvm-dev

unread,
Dec 1, 2020, 8:56:09 PM12/1/20
to Duncan Exon Smith, LLVM Dev
Why does this need a long tail? We have fancy ast refactoring tooling, and a single repository with all the code visible, after all. We can use thar to discover all of the missing noexcepts, and add them, all at once. And then use a clang tidy to help it remain true.

Chris Lattner via llvm-dev

unread,
Dec 2, 2020, 12:05:06 AM12/2/20
to Duncan P. N. Exon Smith, LLVM Dev
On Dec 1, 2020, at 4:07 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:

Can you spell this out for me?  Why do we need noexcept if we’re building with -fno-exceptions?  What is going on here?

Sure. It's a bit convoluted. Here's my understanding:

First, here's why std::vector has this behaviour:
- std::vector grow operations need to transfer their existing elements over to the new storage.
- The grow operations are usually required to meet the "strong exception guarantee": if something throws, this function has no effect.
- If move operations throw, you can't provide this guarantee unless you copy (you can't move back the elements that have been half-moved over, in case another exception is thrown; but if it was just a copy, the original storage still has the elements safely unmodified).
- There's a caveat / carve out, that if T cannot be copy-constructed AND T's move constructor is not noexcept, then the guarantee is waived (since there's no way to implement it).
- Implementation is to call std::move_if_noexcept (https://en.cppreference.com/w/cpp/utility/move_if_noexcept), which moves if it's a noexcept operation, or if T is not copy-constructible.

Second, here's why the behaviour doesn't change when -fno-exceptions:
- -fno-exceptions does NOT imply `noexcept` (maybe it should?, but it doesn't).
- This is implemented by detecting via SFINAE whether something is `noexcept` (maybe std::vector::resize/push_back/etc should have a special case? but that's controversial).

IMO, until all the C++ standard libraries and host compilers that we support being built with will consistently use std::move on grow operations in std::vector in -fno-exceptions mode, we should only use std::vector when we absolutely have to. It's not designed for -fno-exceptions codebases

Wow, thank you for the great explanation.  I agree with you that this seems like a pretty credible reason why we can’t depend on every host std::vector to do what we need, so we should use something like an llvm::Vector.

Two thoughts:

1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction?  I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.

2) I think we should pursue Richard’s proposal or something like it to make Clang provide better C++ performance out of the box.  I assume that all the STL containers will have similar issues (not to mention user defined ones), and noexcept isn’t widely used.  Some flag that turns noexcept on by default (or for everything even with an exception spec?) in -fno-exceptions mode seems like a great thing to do.

-Chris

Chris Lattner via llvm-dev

unread,
Dec 2, 2020, 12:07:10 AM12/2/20
to Duncan P. N. Exon Smith, LLVM Dev
Totally agree, it isn’t worth fighting this.  It is sad that C++ can’t even get std::vector right, but at least it as a language allows us pragmatically make progress here.

-Chris

James Y Knight via llvm-dev

unread,
Dec 2, 2020, 12:52:26 PM12/2/20
to Chris Lattner, LLVM Dev
On Wed, Dec 2, 2020 at 12:05 AM Chris Lattner <clat...@nondot.org> wrote:
On Dec 1, 2020, at 4:07 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:

Can you spell this out for me?  Why do we need noexcept if we’re building with -fno-exceptions?  What is going on here?

Sure. It's a bit convoluted. Here's my understanding:

First, here's why std::vector has this behaviour:
- std::vector grow operations need to transfer their existing elements over to the new storage.
- The grow operations are usually required to meet the "strong exception guarantee": if something throws, this function has no effect.
- If move operations throw, you can't provide this guarantee unless you copy (you can't move back the elements that have been half-moved over, in case another exception is thrown; but if it was just a copy, the original storage still has the elements safely unmodified).
- There's a caveat / carve out, that if T cannot be copy-constructed AND T's move constructor is not noexcept, then the guarantee is waived (since there's no way to implement it).
- Implementation is to call std::move_if_noexcept (https://en.cppreference.com/w/cpp/utility/move_if_noexcept), which moves if it's a noexcept operation, or if T is not copy-constructible.

Second, here's why the behaviour doesn't change when -fno-exceptions:
- -fno-exceptions does NOT imply `noexcept` (maybe it should?, but it doesn't).
- This is implemented by detecting via SFINAE whether something is `noexcept` (maybe std::vector::resize/push_back/etc should have a special case? but that's controversial).

IMO, until all the C++ standard libraries and host compilers that we support being built with will consistently use std::move on grow operations in std::vector in -fno-exceptions mode, we should only use std::vector when we absolutely have to. It's not designed for -fno-exceptions codebases

Wow, thank you for the great explanation.  I agree with you that this seems like a pretty credible reason why we can’t depend on every host std::vector to do what we need, so we should use something like an llvm::Vector.

I strongly disagree here. Not wanting to bother to add 'noexcept' to user-defined move-constructors is a poor justification for switching to a different vector type. Perhaps there are other reasons which might justify avoiding std::vector, but not that...

Chris Tetreault via llvm-dev

unread,
Dec 2, 2020, 1:36:23 PM12/2/20
to James Y Knight, Chris Lattner, LLVM Dev

   The LLVM codebase is designed with having exceptions disabled in mind. In such a world, having noexcept everywhere is just noise (“we don’t throw exceptions, so everything is noexcept! yay!”). Furthermore, we provide a CMake build flag to enable exceptions with the intention that user libs with exceptions enabled would link with LLVM and that user lib exceptions would flow through LLVM. If people start adding noexcept everywhere in order to please std::vector, then you will start seeing crashes.

 

  Since the LLVM codebase largely pretends that exceptions don’t exist, I don’t think it should do anything that will change what actually happens in the presence of exceptions. If this means that we prefer llvm::SmallVector to std::vector for performance reasons, then I say “so be it”.

 

Thanks,

   Christopher Tetreault

Mehdi AMINI via llvm-dev

unread,
Dec 2, 2020, 1:47:46 PM12/2/20
to James Y Knight, LLVM Dev
It isn't clear that it makes sense to just add "noexcept" everywhere in our codebase just because we build with -fno-exception. Should we decorate every single API everywhere? Which one actually matters?
Why is it better to add these everywhere in the codebase rather than just aligning our data-structure? (since there is also the `SmallVectorImpl` issue of function arguments...).

-- 
Mehdi

Duncan P. N. Exon Smith via llvm-dev

unread,
Dec 2, 2020, 3:39:49 PM12/2/20
to Chris Lattner, James Y Knight, LLVM Dev

On 2020 Dec  1, at 21:04, Chris Lattner <clat...@nondot.org> wrote:

1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction?  I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.

I'll try to get a minimal patch together with docs and send an RFC later this week.

(
I think the initial patch could be as simple as:
```
template <class T> using Vector = SmallVector<T, 0>;
```
but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...
)

On 2020 Dec  2, at 09:51, James Y Knight <jykn...@google.com> wrote:

I strongly disagree here. Not wanting to bother to add 'noexcept' to user-defined move-constructors is a poor justification for switching to a different vector type. Perhaps there are other reasons which might justify avoiding std::vector, but not that...

I'll be sure to mention this alternative in the RFC for llvm::Vector; we can talk about it there; maybe you'll convince the rest of us to add the `noexcept`s everywhere and then the justification for llvm::Vector would indeed be weaker (not completely absent though)...

David Blaikie via llvm-dev

unread,
Dec 2, 2020, 6:52:38 PM12/2/20
to Duncan P. N. Exon Smith, LLVM Dev
So there's lots of fragments to this thread and a lot of emails & I think it might be simpler if I put thoughts in one place rather than replying to each subthread.

To Chris's comment(s)
1) In a design like today where we have two names for “inlined vector” and “not inlined vector”, then I think it is important for “InlinedVector<T>” to have at least 1 element.  Defaulting to out of line when using the inline vector name betrays intention: it would be better to generate a compile time error or warning if the element size is too long, because the compiler engineer should use the out of line name.

2) In a design where we have "one name", then it makes more sense to have the default argument type be the “do what I mean” size, which defaults to zero if T is too large.

I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.

To me, this is more like std::string, which is (2) - and more, to me, about the contract of the type - std::string doesn't guarantee iterator validity over swap/move, where std::vector does. If std::vector didn't have this guarantee, it'd be possible to have small vector optimizations in std::vector the same way std::string does - without the user-facing customization, mind you.

So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic. I don't know what to call it, though - I agree to some extent with (1) that calling such a thing SmallVector is a bit confusing - but I don't think it's that confusing, to my mind at least - it doesn't change the contract, and if the user isn't specifying the small buffer size, I think we could reasonably say "it's whatever size is good, and that might be zero" - and importantly if you change the size of the structs inside the SmallVector, it can change the number of elements (including down to, or up from, zero) - compared to having it break when you change the size of a struct, or have it have a weird single element that isn't what the user probably wanted, I think that (the ability to implicitly go to zero) would be a good thing.

To J. Y. Knight's comments:
Why does this need a long tail? We have fancy ast refactoring tooling, and a single repository with all the code visible, after all. We can use thar to discover all of the missing noexcepts, and add them, all at once. And then use a clang tidy to help it remain true.

At least to the best of my knowledge, I don't think we have a really nice clang-tidy integration to help check these things on an ongoing basis - catching them only in code review (seems we have some kind of clang-tidy integration in Phabricator) I think is inadequate. If there's some way to tie in clang-tidy to the build system so it could be enabled at the same level as clang warnings, I'd be up for that general idea, for sure! (I think there are maybe still some issues as Mehdi mentioned, with using that as the route forward for this particular issue, though - in terms of annotating all the necessary ctors with noexcept... but I'm not set against it, if it gets more consideration, I'd be up for giving it a closer look - I like the idea of writing conformantly well performing code using the standard library, for sure)
 
On Wed, Dec 2, 2020 at 12:39 PM Duncan P. N. Exon Smith via llvm-dev <llvm...@lists.llvm.org> wrote:


On 2020 Dec  1, at 21:04, Chris Lattner <clat...@nondot.org> wrote:

1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction?  I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.

I'll try to get a minimal patch together with docs and send an RFC later this week.

(
I think the initial patch could be as simple as:
```
template <class T> using Vector = SmallVector<T, 0>;
```
but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...
)

Yeah, I'm of mixed feelings here - as mentioned above, naming is hard. SmallVector<.., 0> certainly has the iterator validity semantics to match std::vector - but I almost feel like /that/ semantic feature should be the exception here, and the more common name should cover the "use an inline buffer if it's useful" situation (including the possibility of that inline buffer being zero). But for the problem with then having a name really similar to std::vector but with differing semantics.

Maybe that's what it comes down to - the naming collission/semantic difference with std::vector is problematic, so SmallVector<0> is the LLVM std::vector (small sizeof footprint and iterator validity on move). Renaming SmallVector to somehow let it be more general if the "Small" is too confusing if the small buffer size could be implicitly zero (which I think would be valuable, even if we had llvm::Vector where 0 is guaranteed (small sizeof/+iterator validity on move) - because some/most uses don't need that guarantee but would still be better off not hardcoding "at least one" or "zero" size because fo the size of their elements today and would be better off with a floating guarantee that can pick implicitly based on the size of the struct on any given day/machine/etc).

- Dave
 

On 2020 Dec  2, at 09:51, James Y Knight <jykn...@google.com> wrote:

I strongly disagree here. Not wanting to bother to add 'noexcept' to user-defined move-constructors is a poor justification for switching to a different vector type. Perhaps there are other reasons which might justify avoiding std::vector, but not that...

I'll be sure to mention this alternative in the RFC for llvm::Vector; we can talk about it there; maybe you'll convince the rest of us to add the `noexcept`s everywhere and then the justification for llvm::Vector would indeed be weaker (not completely absent though)...

Sean Silva via llvm-dev

unread,
Dec 2, 2020, 7:19:26 PM12/2/20
to Duncan P. N. Exon Smith, LLVM Dev
On Tue, Dec 1, 2020 at 4:41 PM Duncan P. N. Exon Smith <dexon...@apple.com> wrote:
On Nov 30, 2020, at 6:04 PM, Sean Silva <chiso...@gmail.com> wrote:

I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".

I'm fine with this as well.

On 2020 Dec  1, at 14:33, Chris Lattner via llvm-dev <llvm...@lists.llvm.org> wrote:

Overall, I’d recommend a path like this:

This path SGTM.

1) We decide what to do about the default argument.  I agree with Sean that SmallVector<T> should default to 1 at the minimum, and produce an error or warning of T.  This makes sense given the bias towards a name that implies inline semantics.

As mentioned above, I think minimum 1 makes sense.

Thanks Duncan. Looks like we have reached consensus.

I've drafted a patch that adds the default, and added several folks from this thread as reviewers: https://reviews.llvm.org/D92522

-- Sean Silva

Chris Lattner via llvm-dev

unread,
Dec 3, 2020, 12:35:50 AM12/3/20
to Duncan P. N. Exon Smith, LLVM Dev

On Dec 2, 2020, at 12:39 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:



On 2020 Dec  1, at 21:04, Chris Lattner <clat...@nondot.org> wrote:

1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction?  I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.

I'll try to get a minimal patch together with docs and send an RFC later this week.

(
I think the initial patch could be as simple as:
```
template <class T> using Vector = SmallVector<T, 0>;
```
but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...
)

Right, I think it is as simple as the using declaration, but also includes a revision to the coding standards and programmer manual dox.

I do think that renaming SmallVectorImpl to VectorImpl would make sense, that is something we can stage in over time (introduce the alternate name, rename everything in tree, then drop the old name in a few months).

-Chris

Chris Lattner via llvm-dev

unread,
Dec 3, 2020, 12:38:53 AM12/3/20
to David Blaikie, LLVM Dev
On Dec 2, 2020, at 3:52 PM, David Blaikie <dbla...@gmail.com> wrote:
I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.

To me, this is more like std::string, which is (2) - and more, to me, about the contract of the type - std::string doesn't guarantee iterator validity over swap/move, where std::vector does. If std::vector didn't have this guarantee, it'd be possible to have small vector optimizations in std::vector the same way std::string does - without the user-facing customization, mind you.

Yeah I definitely prefer to use standard types if we can, I think there is compelling upthread rationale for why using them in general causes complexity.  Maybe as future C++ standards improve we can drop them.  I hope someday that ArrayRef and StringRef can drop away in time for example.

So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.

Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.

llvm::Vector has some other nice but not compatible things, e.g. pop_back_val()

-Chris

David Blaikie via llvm-dev

unread,
Dec 3, 2020, 1:26:22 PM12/3/20
to Chris Lattner, LLVM Dev
On Wed, Dec 2, 2020 at 9:38 PM Chris Lattner <clat...@nondot.org> wrote:
On Dec 2, 2020, at 3:52 PM, David Blaikie <dbla...@gmail.com> wrote:
I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.

To me, this is more like std::string, which is (2) - and more, to me, about the contract of the type - std::string doesn't guarantee iterator validity over swap/move, where std::vector does. If std::vector didn't have this guarantee, it'd be possible to have small vector optimizations in std::vector the same way std::string does - without the user-facing customization, mind you.

Yeah I definitely prefer to use standard types if we can, I think there is compelling upthread rationale for why using them in general causes complexity.  Maybe as future C++ standards improve we can drop them.  I hope someday that ArrayRef and StringRef can drop away in time for example.

Sorry - wasn't meaning to complain/rail with the "why not std::vector". I was meaning to highlight that I think a better conceptual abstraction than "small" (1 or more inlined objects) and "not small" (0 inlined objects) it'd be better to have an abstraction more like std::string - where the smallness isn't part of the API, as such. 

I'm suggesting that we should have two types:

Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)
The rest: Like std::string. It might have an inline buffer, it might have zero, depending on size, etc. But importantly it's not guaranteed to maintain iterator/pointer validity on move/swap, and sizeof is optimized for stack usage.

Neither of these really make sense being called "SmallVector" I suppose. And I'd dislike calling (2) "Vector" even though it's likely the more popular one - because it'd have subtly different semantics from std::vector regarding iterator invalidation.
So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.

Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.

For sure - I'm not trying to advocate for avoiding having a std::vector replacement in LLVM (excuse the several negatives there). And some differences between the C++ standard APIs and the LLVM ones is expected - llvm::sort's and many of the other alternatives are fairly "obvious" I'd say, it sorts a range instead of a pair of iterators - I don't think anyone would find that surprising. Something with subtly different iterator/pointer invalidation semantics under a near exactly matching name wouldn't be such a good fit, though.
 
llvm::Vector has some other nice but not compatible things, e.g. pop_back_val()

I'm not too worried about some extra API surface area like that - but a difference in iterator invalidation (especially one that was dependent on the sizeof(T)) would be a bit subtle/concerning to me.



Ultimately, what I'm saying is: I think having the thing we currently call SmallVector generalised to "the thing that does not guarantee iterator validity on swap/move and may have a time/sizeof tradeoff that leans towards larger sizeof/faster time" (& such a thing could have an inline storage of 0 sometimes, if that's the right sizeof/speed tradeoff) would be a good thing - though the name might need changing if "Small" is misleading for such a generalisation.
& yeah, the straight up "llvm::Vector" name would be closer to/mostly matching std::vector's semantics with regard to iterator validity, etc.

Not sure if I'm making sense/just going around in circles at this point.

 - Dave

Sean Silva via llvm-dev

unread,
Dec 3, 2020, 1:45:47 PM12/3/20
to Chris Lattner, LLVM Dev
And we can drop the "Impl", since it is a type intended to be used by users :)

-- Sean Silva
 

-Chris

Chris Lattner via llvm-dev

unread,
Dec 5, 2020, 11:46:34 AM12/5/20
to Sean Silva, LLVM Dev


On Dec 3, 2020, at 10:45 AM, Sean Silva <chiso...@gmail.com> wrote:

I'll try to get a minimal patch together with docs and send an RFC later this week.

(
I think the initial patch could be as simple as:
```
template <class T> using Vector = SmallVector<T, 0>;
```
but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...
)

Right, I think it is as simple as the using declaration, but also includes a revision to the coding standards and programmer manual dox.

I do think that renaming SmallVectorImpl to VectorImpl would make sense, that is something we can stage in over time (introduce the alternate name, rename everything in tree, then drop the old name in a few months).

And we can drop the "Impl", since it is a type intended to be used by users :)

I’m not opposed to renaming the Impl suffix to something better, but I don’t think we can make “Vector" == VectorImpl.  Can we?

-Chris

Chris Lattner via llvm-dev

unread,
Dec 5, 2020, 11:56:25 AM12/5/20
to David Blaikie, LLVM Dev
On Dec 3, 2020, at 10:26 AM, David Blaikie <dbla...@gmail.com> wrote:
I was meaning to highlight that I think a better conceptual abstraction than "small" (1 or more inlined objects) and "not small" (0 inlined objects) it'd be better to have an abstraction more like std::string - where the smallness isn't part of the API, as such. 

But we have SmallString as a dual to std::string precisely because we need that distinction.  Speaking of, the whole default argument thing should probably be applied to SmallString, SmallDenseMap and other types as well.

I'm suggesting that we should have two types:

Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)
The rest: Like std::string. It might have an inline buffer, it might have zero, depending on size, etc. But importantly it's not guaranteed to maintain iterator/pointer validity on move/swap, and sizeof is optimized for stack usage.

Neither of these really make sense being called "SmallVector" I suppose. And I'd dislike calling (2) "Vector" even though it's likely the more popular one - because it'd have subtly different semantics from std::vector regarding iterator invalidation.

I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.

I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect.  I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.

Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?  As you say, we don’t have a solution for this in the current code other than std::vector.  I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.

So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.

Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.

For sure - I'm not trying to advocate for avoiding having a std::vector replacement in LLVM (excuse the several negatives there). And some differences between the C++ standard APIs and the LLVM ones is expected - llvm::sort's and many of the other alternatives are fairly "obvious" I'd say, it sorts a range instead of a pair of iterators - I don't think anyone would find that surprising. Something with subtly different iterator/pointer invalidation semantics under a near exactly matching name wouldn't be such a good fit, though.

The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.

-Chris

David Blaikie via llvm-dev

unread,
Dec 5, 2020, 1:47:14 PM12/5/20
to Chris Lattner, LLVM Dev
On Sat, Dec 5, 2020 at 8:56 AM Chris Lattner <clat...@nondot.org> wrote:
On Dec 3, 2020, at 10:26 AM, David Blaikie <dbla...@gmail.com> wrote:
I was meaning to highlight that I think a better conceptual abstraction than "small" (1 or more inlined objects) and "not small" (0 inlined objects) it'd be better to have an abstraction more like std::string - where the smallness isn't part of the API, as such. 

But we have SmallString as a dual to std::string precisely because we need that distinction.  Speaking of, the whole default argument thing should probably be applied to SmallString, SmallDenseMap and other types as well.

I'm suggesting that we should have two types:

Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)
The rest: Like std::string. It might have an inline buffer, it might have zero, depending on size, etc. But importantly it's not guaranteed to maintain iterator/pointer validity on move/swap, and sizeof is optimized for stack usage.

Neither of these really make sense being called "SmallVector" I suppose. And I'd dislike calling (2) "Vector" even though it's likely the more popular one - because it'd have subtly different semantics from std::vector regarding iterator invalidation.

I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.

I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect.  I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.

Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?

I think things got confused again - reduced sizeof/guaranteed iterator validity isn't a string-y type (because of the iterator validity guarantee such a type can't use an inline buffer - unlike std::string), it's what we're talking about/potentially calling llvm::Vector.
 
 As you say, we don’t have a solution for this in the current code other than std::vector.  I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.

Hmm - I think I'm still miscommunicating/not making sense.

I think there's value in having a type that has a guarantee of no-inline-storage - when you want to move and maintain iterator validity.
But I'm wondering if the type with possibly using inline storage could be thought of more like std::string (ie: define the contract: iterators invalidated on move) rather than like SmallVector (ie: "must have an inline buffer"). I don't know what we would call this thing - but my point is to try to move away from the name dictating the implementation detail of having an inline buffer (we could still have SmallVector<T, N> for when you specifically want an inline buffer) but OtherThingVector<T> would not require an inline buffer it would be "a type without iterator validity on move that has, or doesn't have, an inline buffer as it chooses to based on performance tradeoffs".

I think, to me, this would be better than "SmallVector<T> must have at least 1 inline element because of the word "Small" in the name" if we moved away from having Small in the name, and towards some sense of the contract & leaving the implementation to make the choice of how to best implement that contract.

But I think I'm going around in circles/not necessarily making sense here. Perhaps a chat on discord or something would be more effective?
So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.

Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.

For sure - I'm not trying to advocate for avoiding having a std::vector replacement in LLVM (excuse the several negatives there). And some differences between the C++ standard APIs and the LLVM ones is expected - llvm::sort's and many of the other alternatives are fairly "obvious" I'd say, it sorts a range instead of a pair of iterators - I don't think anyone would find that surprising. Something with subtly different iterator/pointer invalidation semantics under a near exactly matching name wouldn't be such a good fit, though.

The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.

Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one. Different performance characteristics, but the same contract, right? So this is working around sub-optimal standard library implementations, but not providing a different contract/requirements, is it?

I think the distinction I'm trying to draw here - is that we're not talking about "a better std::vector" - it's something with a different contract. The thing we're talking about as llvm::Vector is pretty close to "a better std::vector" within the "a world without exceptions" alternate reality of C++ that the C++ standard doesn't acknowledge.

But SmallVector isn't like that, it has a difference in the contract (iterator invalidation being the major part of that, in terms of observable contract - similarly with DenseMap having such iterator invalidation (& the need for a tombstone and empty value)) - and I'm trying to suggest that maybe it's worth thinking about in a way that's different from the way it seems like we've historically thought about it.

Chris Lattner via llvm-dev

unread,
Dec 5, 2020, 3:00:09 PM12/5/20
to David Blaikie, LLVM Dev
On Dec 5, 2020, at 10:46 AM, David Blaikie <dbla...@gmail.com> wrote:
I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.

I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect.  I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.

Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?

I think things got confused again - reduced sizeof/guaranteed iterator validity isn't a string-y type (because of the iterator validity guarantee such a type can't use an inline buffer - unlike std::string), it's what we're talking about/potentially calling llvm::Vector.

Ok, I thought you were talking about bringing that idea to string-y types because std::string doesn’t provide it..
 
 As you say, we don’t have a solution for this in the current code other than std::vector.  I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.

Hmm - I think I'm still miscommunicating/not making sense.

I think there's value in having a type that has a guarantee of no-inline-storage - when you want to move and maintain iterator validity.

Right, in the vector domain, this is the llvm::Vector type(def).

But I'm wondering if the type with possibly using inline storage could be thought of more like std::string (ie: define the contract: iterators invalidated on move) rather than like SmallVector (ie: "must have an inline buffer"). I don't know what we would call this thing - but my point is to try to move away from the name dictating the implementation detail of having an inline buffer (we could still have SmallVector<T, N> for when you specifically want an inline buffer) but OtherThingVector<T> would not require an inline buffer it would be "a type without iterator validity on move that has, or doesn't have, an inline buffer as it chooses to based on performance tradeoffs".

I think, to me, this would be better than "SmallVector<T> must have at least 1 inline element because of the word "Small" in the name" if we moved away from having Small in the name, and towards some sense of the contract & leaving the implementation to make the choice of how to best implement that contract.

But I think I'm going around in circles/not necessarily making sense here. Perhaps a chat on discord or something would be more effective?

I think I understand what you’re saying: you’re focusing on making iterator invalidation semantics primary instead of memory layout.  This is a reasonable thing to do, but I don’t think we should go with a design that ignores the “in place ness”.  Maybe there is a “best of both worlds” design proposal that could make sense here?

The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.

Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one. Different performance characteristics, but the same contract, right? So this is working around sub-optimal standard library implementations, but not providing a different contract/requirements, is it?

It’s really a “similar but different” API.  I works the same in the easy cases, but doesn’t have the same set of contracts, that was my only analogy.   There are lots of other APIs that are similar-but-different in innumerable ways that provide tradeoffs e.g. DenseMap vs std::map, etc.  The programmer’s manual has a long list of options :-)

-Chris


David Blaikie via llvm-dev

unread,
Dec 5, 2020, 3:24:44 PM12/5/20
to Chris Lattner, LLVM Dev
On Sat, Dec 5, 2020 at 12:00 PM Chris Lattner <clat...@nondot.org> wrote:
On Dec 5, 2020, at 10:46 AM, David Blaikie <dbla...@gmail.com> wrote:
I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.

I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect.  I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.

Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?

I think things got confused again - reduced sizeof/guaranteed iterator validity isn't a string-y type (because of the iterator validity guarantee such a type can't use an inline buffer - unlike std::string), it's what we're talking about/potentially calling llvm::Vector.

Ok, I thought you were talking about bringing that idea to string-y types because std::string doesn’t provide it..

Oh, right, sorry, yeah - not proposing any new string abstractions. Instead trying to draw analogy between how std::string is specified in the standard with how (name notwithstanding) SmallVector could be specified. Mostly I just don't have a good name for the entity I'm trying to describe. "Something like std::vector, but with the iterator invalidation semantics of std::string such that this thing can use a small inline buffer if it's beneficial" without it /having/ to use such an inline buffer, if, say, the sizeof(T) is too large to be reasonable. And that I think this would be a good thing so users could describe their willingness to accept those semantics "I guarantee I don't care about iterator validity over move" and then the implementation would "do the right thing" given that constraint - if T happens to be large, it'd potentially do the same thing as llvm::Vector, but the user wouldn't have to look at the sizeof(T) and bake that decision in, instead letting the cointainer make that choice and letting that choice remain correct/up-to-date as the shape/size of T changes over time. I think this would make the code more maintainable - the likely-right implementation tradeoffs happen as code changes, having to revisit the choice of data structure less often.
 
 
 As you say, we don’t have a solution for this in the current code other than std::vector.  I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.

Hmm - I think I'm still miscommunicating/not making sense.

I think there's value in having a type that has a guarantee of no-inline-storage - when you want to move and maintain iterator validity.

Right, in the vector domain, this is the llvm::Vector type(def).

But I'm wondering if the type with possibly using inline storage could be thought of more like std::string (ie: define the contract: iterators invalidated on move) rather than like SmallVector (ie: "must have an inline buffer"). I don't know what we would call this thing - but my point is to try to move away from the name dictating the implementation detail of having an inline buffer (we could still have SmallVector<T, N> for when you specifically want an inline buffer) but OtherThingVector<T> would not require an inline buffer it would be "a type without iterator validity on move that has, or doesn't have, an inline buffer as it chooses to based on performance tradeoffs".

I think, to me, this would be better than "SmallVector<T> must have at least 1 inline element because of the word "Small" in the name" if we moved away from having Small in the name, and towards some sense of the contract & leaving the implementation to make the choice of how to best implement that contract.

But I think I'm going around in circles/not necessarily making sense here. Perhaps a chat on discord or something would be more effective?

I think I understand what you’re saying: you’re focusing on making iterator invalidation semantics primary instead of memory layout.

In the sense that the difference in iterator invalidation semantics are the element that allows the implementation to make different choices in memory layout. 
 
 This is a reasonable thing to do, but I don’t think we should go with a design that ignores the “in place ness”.

I'm suggesting somewhat ignoring the "in place nesss" in terms of the API design, but not implementation. std::string's spec doesn't speak about "in place ness", but it does talk about iterator invalidation & specifies it in a way that allows "in place ness" & leaves the implementation scope to choose what that looks like - how large the in place buffer is, how it's implemented (do you have a pointer that points back into the object itself for the in-place buffer, or some bit squirreled away to determine big/small and allow use of the pointer bits for inline storage for more compact/longer effective inline storage, etc).

That doesn't mean the implementation choices are totally invisible, and the spec doesn't tend to talk about the sizeof of C++ standard library types - a std::string could have sizeof 512 and be conforming, but would be gratuitously suboptimal, but we're generally accepting that it maybe sizeof > 3 words but not hugely moreso.

I don't think we should move to a point where we don't have the ability to specify the inline buffer size when it's worth doing so for optimization purposes. But for when the size isn't being specified, I think it may be more maintainable if it's thought of more about the contract and let the implementation choose - including possibly choosing 0.
 
 Maybe there is a “best of both worlds” design proposal that could make sense here?

The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.

Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one. Different performance characteristics, but the same contract, right? So this is working around sub-optimal standard library implementations, but not providing a different contract/requirements, is it?

It’s really a “similar but different” API.  I works the same in the easy cases, but doesn’t have the same set of contracts, that was my only analogy.  

Ah, perhaps I'm missing the detail: What part of the contract is different between llvm::sort(iter, iter) and std::sort(iter, iter)? 
 
There are lots of other APIs that are similar-but-different in innumerable ways that provide tradeoffs e.g. DenseMap vs std::map, etc.  The programmer’s manual has a long list of options :-)

For sure

- Dave 
Reply all
Reply to author
Forward
0 new messages