I think someone (JYKnight, perhaps) mentioned in the code review
(always difficult fragmenting the discussion between code review and
RFC, unfortunately - not sure there's a great solution to that - some
way to lock comments on a Phab review might be nice) that there are
cases where you do want a small inline buffer even when you're nested
inside another data structure and/or heap allocated (like tail
allocations).
Got any sense of the total value here? Major savings to be had?
(if SmallVector<T> can do what your proposed SVec<T> does, that leaves
the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
over std::vector<T>? I guess the SmallVectorImpl generic algorithm
opportunities? Though that's rarely needed compared to ArrayRef.)
If SmallVector<T> would suffice then maybe Vec<T> could be
ZeroSmallVector<T>? Not sure.
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I will say I'm not a huge fan of adding even more names for things in
this fairly core/common use case (now we'll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector<T> instead of SmallVector<T, N> and would that
address some of the use cases proposed here?
I think someone (JYKnight, perhaps) mentioned in the code review
(always difficult fragmenting the discussion between code review and
RFC, unfortunately - not sure there's a great solution to that - some
way to lock comments on a Phab review might be nice) that there are
cases where you do want a small inline buffer even when you're nested
inside another data structure and/or heap allocated (like tail
allocations).
Got any sense of the total value here? Major savings to be had?
(if SmallVector<T> can do what your proposed SVec<T> does, that leaves
the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
over std::vector<T>? I guess the SmallVectorImpl generic algorithm
opportunities? Though that's rarely needed compared to ArrayRef.)
If SmallVector<T> would suffice then maybe Vec<T> could be
ZeroSmallVector<T>? Not sure.
I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?
> To me the drawbacks are outweighing the benefits too much.
> Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.
Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)
> An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".
That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)
> Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is because it is frequently preferable to `std::vector` but still isn't readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).
I'm not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.
Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the "Small" is slightly misleading (though not
entirely - it's important for the "SmallVectorImpl" benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl<T>& might also be confusing).
>> Got any sense of the total value here? Major savings to be had?
>>
>> (if SmallVector<T> can do what your proposed SVec<T> does, that leaves
>> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
>> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>> opportunities? Though that's rarely needed compared to ArrayRef.)
>> If SmallVector<T> would suffice then maybe Vec<T> could be
>> ZeroSmallVector<T>? Not sure.
>
>
> ZeroSmallVector<T> does not really address your "more vector names to pick from" concerns,
Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
have one fewer name and less convention to base the Vec<T> on (since
it won't have the SVec<T> sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.
> and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the "default case" to be the easiest to reach / most intuitive to pick? `llvm::Vec` looks like "just a vector".
Somewhat - but Vec is an abbreviation that isn't really used otherwise
(if we consider SVec as well, I'd still push back on the introduction
of the abbreviation for both cases) and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).
I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.
(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)
On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joke...@gmail.com> wrote:
> On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dbla...@gmail.com> wrote:
>>
>> I will say I'm not a huge fan of adding even more names for things in
>> this fairly core/common use case (now we'll have even more vector
>> names to pick from) - can we use default template arguments so we can
>> write SmallVector<T> instead of SmallVector<T, N> and would that
>> address some of the use cases proposed here?
>
>
> I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).
I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?
> To me the drawbacks are outweighing the benefits too much.
> Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.
Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)
> An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".
That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)
and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).
I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.
(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)
I don't know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.
As for whether it's intentional: I think if we're moving in this
direction that's proposed, the idea is that by default we don't want
to be explicit about the N - so in general we won't be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don't wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn't
required/intended unless it's added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.
>> > To me the drawbacks are outweighing the benefits too much.
>> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.
>>
>> Why would it be necessary to figure that out? If we generally feel the
>> right default inline storage happens to be zero in that case and that
>> SmallVector will pick a good default (including possibly 0) that seems
>> like a good thing to me. (why would we call out zero specially, if we
>> want to avoid calling out other values explicitly)
>
>
> I think this is an API that is "easy to misuse", I don't see any advantage to it while it has drawbacks: why would we do that?
What kind of misuse do you have in mind? To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there's no need for us to think about
it on every use, we just let the default do what it's meant to do.
>> > An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".
>>
>> That quoted statement still sounds like it could encapsulate the
>> possibility of 0 too, though. "limited inline storage if possible"
>> could be "and the answer to that constraint is zero in this case - but
>> if you happen to make the T smaller, it could become non-zero
>> organically/without the need to touch this code" (as the N could vary
>> organically between non-zero values as struct sizes change too)
>
> Yes the quoted statement says exactly that!
Sorry, it seems we're jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.
Two separate discussions/threads/patches may help keep the
communication more clear.
> This is why I don't like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
> SVec does not accept a `N`: if I read SVec<Type> there can't be a possible accidental omission of N here. It has to be a choice between `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not `SmallVector<SomeType>`.
To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it's worth considering/comparing this to other templates and their
default template arguments. Things like std::vector's customizable
allocators (you. don't have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don't wonder every time we see std::vector<T> whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it's
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec<T> the same way we
ask about other data structures today "did you mean SmallVector<T,
3>"?), unique_ptr's custom deleters, etc.
Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have "here's the default vector you shuold use instead of
std::vector for these reasons" situations).
>> I wouldn't entirely object to SmallVector<T> getting a smart default
>> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> don't feel /super/ great about it, but yeah, if we're going to push
>> the Programmer's Manual encouragement further in terms of
>> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> the worst way to do that.
>>
>> (it might be that this would benefit from being two separate
>> discussions, though - one on how to provide smart defaults for
>> SmallVector<T> and one on if/how to push std::vector deprecation in
>> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> spelled) further along)
>
>
> For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
> We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.
Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).
Sounds like maybe I am in agreement with the original version of the
review, then.
- Dave
I appreciate the work you’ve done on this. My impression is that the thought process that goes into picking the size parameter is something like “Uhh… Uhh… I guess 4 is good?”, so doing work to formalize this is good. I also think that people are likely reluctant to use `SmallVector<T, 0>`. That said, I don’t think adding new types is useful.
I think `SmallVector` should just get a default size that is computed as `SVec`’s default size parameter is. Honestly, I’m unconcerned that anybody would “forget” to populate the size parameter. I really feel like any value that isn’t backed by hard data is unlikely to be better than the result of ` calculateSVecInlineElements<T>()`, and should be questioned in code review given the new heuristic you’ve added.
In short, I think that `SmallVector` should get a default size, and that the documentation should become something to the effect of “SmallVector should be preferred to std::vector unless you have a good reason to use the stl container. (api interop?). The default size for SmallVector has been chosen to be reasonable for most use cases. If you have evidence that shows the default is not optimal for your use case you can populate it with a more suitable value.”
Again, thanks for working on this. I’m looking forward to the day when I don’t have to pick a number out of my backside to put in that parameter.
Thanks,
Christopher Tetreault
>> > To me the drawbacks are outweighing the benefits too much.
>> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.
>>
>> Why would it be necessary to figure that out? If we generally feel the
>> right default inline storage happens to be zero in that case and that
>> SmallVector will pick a good default (including possibly 0) that seems
>> like a good thing to me. (why would we call out zero specially, if we
>> want to avoid calling out other values explicitly)
>
>
> I think this is an API that is "easy to misuse", I don't see any advantage to it while it has drawbacks: why would we do that?
What kind of misuse do you have in mind?
To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there's no need for us to think about
it on every use, we just let the default do what it's meant to do.
>> > An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".
>>
>> That quoted statement still sounds like it could encapsulate the
>> possibility of 0 too, though. "limited inline storage if possible"
>> could be "and the answer to that constraint is zero in this case - but
>> if you happen to make the T smaller, it could become non-zero
>> organically/without the need to touch this code" (as the N could vary
>> organically between non-zero values as struct sizes change too)
>
> Yes the quoted statement says exactly that!
Sorry, it seems we're jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.
Two separate discussions/threads/patches may help keep the
communication more clear.
> This is why I don't like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
> SVec does not accept a `N`: if I read SVec<Type> there can't be a possible accidental omission of N here. It has to be a choice between `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not `SmallVector<SomeType>`.
To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it's worth considering/comparing this to other templates and their
default template arguments. Things like std::vector's customizable
allocators (you. don't have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don't wonder every time we see std::vector<T> whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it's
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec<T> the same way we
ask about other data structures today "did you mean SmallVector<T,
3>"?), unique_ptr's custom deleters, etc.
>> I wouldn't entirely object to SmallVector<T> getting a smart default
>> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> don't feel /super/ great about it, but yeah, if we're going to push
>> the Programmer's Manual encouragement further in terms of
>> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> the worst way to do that.
>>
>> (it might be that this would benefit from being two separate
>> discussions, though - one on how to provide smart defaults for
>> SmallVector<T> and one on if/how to push std::vector deprecation in
>> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> spelled) further along)
>
>
> For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
> We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.
Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).
Sounds like maybe I am in agreement with the original version of the
review, then.
>> I wouldn't entirely object to SmallVector<T> getting a smart default
>> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> don't feel /super/ great about it, but yeah, if we're going to push
>> the Programmer's Manual encouragement further in terms of
>> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> the worst way to do that.
>>
>> (it might be that this would benefit from being two separate
>> discussions, though - one on how to provide smart defaults for
>> SmallVector<T> and one on if/how to push std::vector deprecation in
>> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> spelled) further along)
>
>
> For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else :)
> We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.
Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).
Sounds like maybe I am in agreement with the original version of the
review, then.Likely: the review evolved because I opposed to changing the status quo in this direction I guess.
2. I don't understand the concern about "forgetting the N". In the "new world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be treated as "I profiled this code and this explicit N is the right choice". That is, adding an explicit `N` is going to be rare and deliberate (possibly with reviewers insisting on a comment!) -- this is not something that one can "forget".
We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".
Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.
On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".Hey Sean,I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.
Thoughts/suggestions:- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.
Out of curiosity: Why a single rather than zero?
> - The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).
>
> - If people are concerned about the default being bad, then you could choose to make “SmallVector<T, -1>” be the thing that autosizes. I tend to agree with your viewpoint that the N chosen is arbitrary in almost all cases anyway though, so I wouldn’t go this way.
>
>
> Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.
>
>
> These benefits are tiny, I really don’t think it is worth introducing a new name for this. I think this is better handled with a change to CodingStandards (saying “don’t use std::vector”) and the programmers manual.
>
> -Chris
>
On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".Hey Sean,I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.Thoughts/suggestions:- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.That would work for me.Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?
On 2020 Nov 17, at 13:40, Sean Silva <chiso...@gmail.com> wrote:On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".Hey Sean,I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.Thoughts/suggestions:- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.That would work for me.Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?
-- Sean Silva- The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).
On 2020 Nov 17, at 13:40, Sean Silva <chiso...@gmail.com> wrote:On Tue, Nov 17, 2020 at 7:26 AM Chris Lattner <clat...@nondot.org> wrote:On Nov 13, 2020, at 2:06 PM, Sean Silva via llvm-dev <llvm...@lists.llvm.org> wrote:We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".Hey Sean,I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.Thanks Chris. That was my original proposal when I first drafted the patch, and I'm happy to just add the default.Thoughts/suggestions:- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.That would work for me.Mehdi, would SmallVector<T>, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?I'd be a bit concerned about this for two reasons:- Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn't have them. If we're going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.
- The `static_assert` is going to fail differently on different platforms, since `T`'s size will tend to be platform-dependent for large `T`. I'm not sure it'll be predictable enough.
-- Sean Silva- The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is *rename* SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).I like the idea of a minor rename, although a bit different:- SmallVectorBase => VectorBase- SmallVectorImpl => VectorImpl (we could keep both around for a transition period)- (etc.)- SmallVector => Vector, but give `Vector` a default small size of 0- Add SmallVector as a wrapper around Vector that has a nice default for the small size:```template <class T>using SmallVector = Vector<T, DefaultInlinedElementSize<T>>;```
I'm still not sure why "at least 1" is an important goal. I'd be
curious to hear from Chris/others why that's especially valuable.
If N is 0, then it's not a small-size optimized vector, in contrast to
what the name implies.
Actually, if the intention is to have no inlined elements, I think one
should use std:vector. Generally, projects should not reinvent STL
containers and prefer the standard library one unless there is a good
enough reason not to use them.
Michael
I agree for regular code, but SmallVector can be and is used in
templates as SmallVector<T, N> where T is not known to the code
writer. I'm also curious what makes the number 1 so special.
Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.
Out of curiosity: Why a single rather than zero?
On Nov 25, 2020, at 10:55 PM, Michael Kruse via llvm-dev <llvm...@lists.llvm.org> wrote:Am Mi., 25. Nov. 2020 um 17:52 Uhr schrieb David Blaikie via llvm-dev
<llvm...@lists.llvm.org>:I'm still not sure why "at least 1" is an important goal. I'd be
curious to hear from Chris/others why that's especially valuable.
If N is 0, then it's not a small-size optimized vector, in contrast to
what the name implies.
On Nov 17, 2020, at 2:15 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:I'd be a bit concerned about this for two reasons:- Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn't have them. If we're going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.
- The `static_assert` is going to fail differently on different platforms, since `T`'s size will tend to be platform-dependent for large `T`. I'm not sure it'll be predictable enough.
Sorry for falling off the map on this thread:On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.
Out of curiosity: Why a single rather than zero?My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created. While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial. I don’t see why we’d recommend SmallVector<T,0> over vector to get those.
SmallVector<Type, 0>
to be preferred over std::vector<Type>
."> On 2020 Nov 25, at 22:55, Michael Kruse via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Am Mi., 25. Nov. 2020 um 17:52 Uhr schrieb David Blaikie via llvm-dev
> <llvm...@lists.llvm.org>:
>> I'm still not sure why "at least 1" is an important goal. I'd be
>> curious to hear from Chris/others why that's especially valuable.
>
> If N is 0, then it's not a small-size optimized vector, in contrast to
> what the name implies.
>
> Actually, if the intention is to have no inlined elements, I think one
> should use std:vector. Generally, projects should not reinvent STL
> containers and prefer the standard library one unless there is a good
> enough reason not to use them.
SmallVector<T, 0> has a number of benefits that make it better in practice than std::vector, at least for LLVM code:
- Interoperates with the pervasively used SmallVectorImpl<T>.
- No exception guarantees, and thus none of the related, harmful pessimizations (std::vector::resize will do expensive copies instead of cheap moves in some cases, last I checked).
- Customizations for using memcpy more often.
- Smaller by a pointer for most `T` (64-bit pointers).
Agreed that the name is wrong and not ergonomic, but we should not be using std::vector pretty much ever IMO.
On 2020 Nov 27, at 20:45, Chris Lattner via llvm-dev <llvm...@lists.llvm.org> wrote:Sorry for falling off the map on this thread:On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.
Out of curiosity: Why a single rather than zero?My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created. While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial.
The LLVM project also hosts libc++, shouldn't we dogfood our own
implementations?
2 and 3 should be optimizations possible in the STL. I disagree that
SmallVectorImpl is pervasively used. The normal use case is a function
parameter where the caller creates a SmallVector just to pass it, and
can continue to do so with zero inline elements even if in other
places we use std::vector.
Michael
Am Mo., 30. Nov. 2020 um 19:34 Uhr schrieb Duncan P. N. Exon Smith
<dexon...@apple.com>:r<T, 0> has a number of benefits that make it
better in practice than std::vector, at least for LLVM code:
> - Interoperates with the pervasively used SmallVectorImpl<T>.
> - No exception guarantees, and thus none of the related, harmful pessimizations (std::vector::resize will do expensive copies instead of cheap moves in some cases, last I checked).
> - Customizations for using memcpy more often.
> - Smaller by a pointer for most `T` (64-bit pointers).
The LLVM project also hosts libc++, shouldn't we dogfood our own
implementations?
2 and 3 should be optimizations possible in the STL. I disagree that
SmallVectorImpl is pervasively used.
On Nov 17, 2020, at 1:42 PM, David Blaikie <dbla...@gmail.com> wrote:Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.
Out of curiosity: Why a single rather than zero?My rationale for this is basically that SmallVector is typically used for the case when you want to avoid an out-of-line allocation for a small number of elements, this was the reason it was created. While there is some performance benefits of SmallVector<T,0> over std::vector<> they are almost trivial.The performance benefits aren't trivial.std::vector grow operations will refuse to use std::move for some T, a pessimization required by its exception guarantees, even if you're building with `-fno-exceptions`. We had a massive compile-time problem in 2016 related to this that I fixed with 3c406c2da52302eb5cced431373f240b9c037841 by switching to SmallVector<T,0>. You can see the history in r338071 / 0f81faed05c3c7c1fbaf6af402411c99d715cf56.That issue, at least, is fixable without switching from std::vector just by adding noexcept to the appropriate user-defined move constructors.Sure, once we’ve added noexcept to all types in LLVM/Clang/etc. That’s a pretty long tail though; a lot of work for relatively little gain given that we don’t care about exceptions anyway and we have an optimized vector implementation in tree.
On Nov 30, 2020, at 6:04 PM, Sean Silva <chiso...@gmail.com> wrote:I'm okay with either 1 or 0 inlined elements for huge value types.I can buy the argument both ways. It basically comes down to the invariants we want to provide:- invariant for "minimum 0 inlined elements" is "sizeof(SmallVector<T>) <= 64"- this in theory provides stronger invariants related to excessive memory usage from the inlined elements- invariant for "minimum 1 inlined elements" is "at least one inlined element, possibly more if we can fit it in 64 bytes"- this avoids confusion of SmallVector<T> not having inlined elements
I don't think either choice really boxes us out of any future change or even matters much. So why don't we all rally around the 1 inlined element minimum case? Given the name "SmallVector" (which is a bad name, but folks will read it as "vector with inlined elements") it seems least surprising for now.And in practice huge value types are very rare, so honestly I don't think that the invariant provided by "minimum 0" really buys us much in practice when viewed holistically. (also, both cases prevent SmallVector<SmallVector<T>> from multiplicatively exploding in size which makes me happy).I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".
There's a larger discussion to be had here which I don't want to block this change on: Some folks (including me) believe that SmallVector has drifted from "used for small number of elements" and is in practice more like "LLVM's better vector" (including using 0 inline elements to replace std::vector) and the latter interpretation makes the "minimum 0" case make more sense. But we haven't codified the "use SmallVector / a new llvm::Vector unless you know you need std::vector" policy, and once we do then switching to the "minimum 0" case is pretty trivial. I'm very happy to have that discussion, but it seems incongruous to block the "default N" change on having that discussion.
I'm wondering if we should have an experimentaloption to specify that functions are noexcept by default (overridableby an explicit exception specification)
On Nov 30, 2020, at 6:04 PM, Sean Silva <chiso...@gmail.com> wrote:
I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".
Overall, I’d recommend a path like this:
1) We decide what to do about the default argument. I agree with Sean that SmallVector<T> should default to 1 at the minimum, and produce an error or warning of T. This makes sense given the bias towards a name that implies inline semantics.
2) We decide whether we want to ban std::vector in the LLVM code base by convention. If so, I think that we should have a *separate* name for the out of line case, e.g. llvm::Vector<T>, which would be a good dual to llvm::SmallVector<T>. If this is the end point (Duncan seems to think it should be) then we should make this part of the coding standard and move the code base towards it over time.
On Dec 1, 2020, at 4:07 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:Can you spell this out for me? Why do we need noexcept if we’re building with -fno-exceptions? What is going on here?Sure. It's a bit convoluted. Here's my understanding:First, here's why std::vector has this behaviour:- std::vector grow operations need to transfer their existing elements over to the new storage.- The grow operations are usually required to meet the "strong exception guarantee": if something throws, this function has no effect.- If move operations throw, you can't provide this guarantee unless you copy (you can't move back the elements that have been half-moved over, in case another exception is thrown; but if it was just a copy, the original storage still has the elements safely unmodified).- There's a caveat / carve out, that if T cannot be copy-constructed AND T's move constructor is not noexcept, then the guarantee is waived (since there's no way to implement it).- Implementation is to call std::move_if_noexcept (https://en.cppreference.com/w/cpp/utility/move_if_noexcept), which moves if it's a noexcept operation, or if T is not copy-constructible.Second, here's why the behaviour doesn't change when -fno-exceptions:- -fno-exceptions does NOT imply `noexcept` (maybe it should?, but it doesn't).- This is implemented by detecting via SFINAE whether something is `noexcept` (maybe std::vector::resize/push_back/etc should have a special case? but that's controversial).IMO, until all the C++ standard libraries and host compilers that we support being built with will consistently use std::move on grow operations in std::vector in -fno-exceptions mode, we should only use std::vector when we absolutely have to. It's not designed for -fno-exceptions codebases
On Dec 1, 2020, at 4:07 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:Can you spell this out for me? Why do we need noexcept if we’re building with -fno-exceptions? What is going on here?Sure. It's a bit convoluted. Here's my understanding:First, here's why std::vector has this behaviour:- std::vector grow operations need to transfer their existing elements over to the new storage.- The grow operations are usually required to meet the "strong exception guarantee": if something throws, this function has no effect.- If move operations throw, you can't provide this guarantee unless you copy (you can't move back the elements that have been half-moved over, in case another exception is thrown; but if it was just a copy, the original storage still has the elements safely unmodified).- There's a caveat / carve out, that if T cannot be copy-constructed AND T's move constructor is not noexcept, then the guarantee is waived (since there's no way to implement it).- Implementation is to call std::move_if_noexcept (https://en.cppreference.com/w/cpp/utility/move_if_noexcept), which moves if it's a noexcept operation, or if T is not copy-constructible.Second, here's why the behaviour doesn't change when -fno-exceptions:- -fno-exceptions does NOT imply `noexcept` (maybe it should?, but it doesn't).- This is implemented by detecting via SFINAE whether something is `noexcept` (maybe std::vector::resize/push_back/etc should have a special case? but that's controversial).IMO, until all the C++ standard libraries and host compilers that we support being built with will consistently use std::move on grow operations in std::vector in -fno-exceptions mode, we should only use std::vector when we absolutely have to. It's not designed for -fno-exceptions codebasesWow, thank you for the great explanation. I agree with you that this seems like a pretty credible reason why we can’t depend on every host std::vector to do what we need, so we should use something like an llvm::Vector.
The LLVM codebase is designed with having exceptions disabled in mind. In such a world, having noexcept everywhere is just noise (“we don’t throw exceptions, so everything is noexcept! yay!”). Furthermore, we provide a CMake build flag to enable exceptions with the intention that user libs with exceptions enabled would link with LLVM and that user lib exceptions would flow through LLVM. If people start adding noexcept everywhere in order to please std::vector, then you will start seeing crashes.
Since the LLVM codebase largely pretends that exceptions don’t exist, I don’t think it should do anything that will change what actually happens in the presence of exceptions. If this means that we prefer llvm::SmallVector to std::vector for performance reasons, then I say “so be it”.
Thanks,
Christopher Tetreault
On 2020 Dec 1, at 21:04, Chris Lattner <clat...@nondot.org> wrote:1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction? I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.
On 2020 Dec 2, at 09:51, James Y Knight <jykn...@google.com> wrote:I strongly disagree here. Not wanting to bother to add 'noexcept' to user-defined move-constructors is a poor justification for switching to a different vector type. Perhaps there are other reasons which might justify avoiding std::vector, but not that...
1) In a design like today where we have two names for “inlined vector” and “not inlined vector”, then I think it is important for “InlinedVector<T>” to have at least 1 element. Defaulting to out of line when using the inline vector name betrays intention: it would be better to generate a compile time error or warning if the element size is too long, because the compiler engineer should use the out of line name.
2) In a design where we have "one name", then it makes more sense to have the default argument type be the “do what I mean” size, which defaults to zero if T is too large.I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.
Why does this need a long tail? We have fancy ast refactoring tooling, and a single repository with all the code visible, after all. We can use thar to discover all of the missing noexcepts, and add them, all at once. And then use a clang tidy to help it remain true.
On 2020 Dec 1, at 21:04, Chris Lattner <clat...@nondot.org> wrote:1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction? I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.I'll try to get a minimal patch together with docs and send an RFC later this week.(I think the initial patch could be as simple as:```template <class T> using Vector = SmallVector<T, 0>;```but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...)
On 2020 Dec 2, at 09:51, James Y Knight <jykn...@google.com> wrote:I strongly disagree here. Not wanting to bother to add 'noexcept' to user-defined move-constructors is a poor justification for switching to a different vector type. Perhaps there are other reasons which might justify avoiding std::vector, but not that...I'll be sure to mention this alternative in the RFC for llvm::Vector; we can talk about it there; maybe you'll convince the rest of us to add the `noexcept`s everywhere and then the justification for llvm::Vector would indeed be weaker (not completely absent though)...
On Nov 30, 2020, at 6:04 PM, Sean Silva <chiso...@gmail.com> wrote:I actually was mildly leaning to the "minimum 0" side, but after writing the above I'm now leaning towards "minimum 1".I'm fine with this as well.Overall, I’d recommend a path like this:This path SGTM.1) We decide what to do about the default argument. I agree with Sean that SmallVector<T> should default to 1 at the minimum, and produce an error or warning of T. This makes sense given the bias towards a name that implies inline semantics.As mentioned above, I think minimum 1 makes sense.
On Dec 2, 2020, at 12:39 PM, Duncan P. N. Exon Smith <dexon...@apple.com> wrote:On 2020 Dec 1, at 21:04, Chris Lattner <clat...@nondot.org> wrote:1) are you, or anyone else, interested in driving an llvm::Vector proposal + coding standard change to get us going in the right direction? I don’t think we need a mass migration of the code base, just get the policy aligned right plus the new type name to exist.I'll try to get a minimal patch together with docs and send an RFC later this week.(I think the initial patch could be as simple as:```template <class T> using Vector = SmallVector<T, 0>;```but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...)
On Dec 2, 2020, at 3:52 PM, David Blaikie <dbla...@gmail.com> wrote:I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.
To me, this is more like std::string, which is (2) - and more, to me, about the contract of the type - std::string doesn't guarantee iterator validity over swap/move, where std::vector does. If std::vector didn't have this guarantee, it'd be possible to have small vector optimizations in std::vector the same way std::string does - without the user-facing customization, mind you.
So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.
On Dec 2, 2020, at 3:52 PM, David Blaikie <dbla...@gmail.com> wrote:I’m pretty skeptical of #2 as an approach, because inline and out of line design points are pretty different when sizeof(T) is 4 or 8, and when sizeof(thevector) matters, e.g. in 2D cases.
To me, this is more like std::string, which is (2) - and more, to me, about the contract of the type - std::string doesn't guarantee iterator validity over swap/move, where std::vector does. If std::vector didn't have this guarantee, it'd be possible to have small vector optimizations in std::vector the same way std::string does - without the user-facing customization, mind you.Yeah I definitely prefer to use standard types if we can, I think there is compelling upthread rationale for why using them in general causes complexity. Maybe as future C++ standards improve we can drop them. I hope someday that ArrayRef and StringRef can drop away in time for example.
So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.
llvm::Vector has some other nice but not compatible things, e.g. pop_back_val()
-Chris
On Dec 3, 2020, at 10:45 AM, Sean Silva <chiso...@gmail.com> wrote:I'll try to get a minimal patch together with docs and send an RFC later this week.(I think the initial patch could be as simple as:```template <class T> using Vector = SmallVector<T, 0>;```but maybe we'd want to rename `SmallVectorImpl` to `VectorImpl`, or maybe there are other ideas, but off topic for this thread...)Right, I think it is as simple as the using declaration, but also includes a revision to the coding standards and programmer manual dox.I do think that renaming SmallVectorImpl to VectorImpl would make sense, that is something we can stage in over time (introduce the alternate name, rename everything in tree, then drop the old name in a few months).And we can drop the "Impl", since it is a type intended to be used by users :)
On Dec 3, 2020, at 10:26 AM, David Blaikie <dbla...@gmail.com> wrote:I was meaning to highlight that I think a better conceptual abstraction than "small" (1 or more inlined objects) and "not small" (0 inlined objects) it'd be better to have an abstraction more like std::string - where the smallness isn't part of the API, as such.
I'm suggesting that we should have two types:
Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)
The rest: Like std::string. It might have an inline buffer, it might have zero, depending on size, etc. But importantly it's not guaranteed to maintain iterator/pointer validity on move/swap, and sizeof is optimized for stack usage.
Neither of these really make sense being called "SmallVector" I suppose. And I'd dislike calling (2) "Vector" even though it's likely the more popular one - because it'd have subtly different semantics from std::vector regarding iterator invalidation.
So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.
For sure - I'm not trying to advocate for avoiding having a std::vector replacement in LLVM (excuse the several negatives there). And some differences between the C++ standard APIs and the LLVM ones is expected - llvm::sort's and many of the other alternatives are fairly "obvious" I'd say, it sorts a range instead of a pair of iterators - I don't think anyone would find that surprising. Something with subtly different iterator/pointer invalidation semantics under a near exactly matching name wouldn't be such a good fit, though.
On Dec 3, 2020, at 10:26 AM, David Blaikie <dbla...@gmail.com> wrote:I was meaning to highlight that I think a better conceptual abstraction than "small" (1 or more inlined objects) and "not small" (0 inlined objects) it'd be better to have an abstraction more like std::string - where the smallness isn't part of the API, as such.But we have SmallString as a dual to std::string precisely because we need that distinction. Speaking of, the whole default argument thing should probably be applied to SmallString, SmallDenseMap and other types as well.I'm suggesting that we should have two types:
Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)
The rest: Like std::string. It might have an inline buffer, it might have zero, depending on size, etc. But importantly it's not guaranteed to maintain iterator/pointer validity on move/swap, and sizeof is optimized for stack usage.
Neither of these really make sense being called "SmallVector" I suppose. And I'd dislike calling (2) "Vector" even though it's likely the more popular one - because it'd have subtly different semantics from std::vector regarding iterator invalidation.I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect. I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?
As you say, we don’t have a solution for this in the current code other than std::vector. I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.
So I think (2) has some legs - but the problem for me is that if we name the type llvm::Vector, which already gets a bit close to std::vector (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning towards being OK with that aspect if other peopel are) and it has different iterator invalidation/move semantics than std::vector, I think that could be problematic.Not sure if I want to defend this, but we have lots of precedent for this, including llvm::sort etc.
For sure - I'm not trying to advocate for avoiding having a std::vector replacement in LLVM (excuse the several negatives there). And some differences between the C++ standard APIs and the LLVM ones is expected - llvm::sort's and many of the other alternatives are fairly "obvious" I'd say, it sorts a range instead of a pair of iterators - I don't think anyone would find that surprising. Something with subtly different iterator/pointer invalidation semantics under a near exactly matching name wouldn't be such a good fit, though.The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.
On Dec 5, 2020, at 10:46 AM, David Blaikie <dbla...@gmail.com> wrote:I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect. I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?I think things got confused again - reduced sizeof/guaranteed iterator validity isn't a string-y type (because of the iterator validity guarantee such a type can't use an inline buffer - unlike std::string), it's what we're talking about/potentially calling llvm::Vector.
As you say, we don’t have a solution for this in the current code other than std::vector. I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.
Hmm - I think I'm still miscommunicating/not making sense.
I think there's value in having a type that has a guarantee of no-inline-storage - when you want to move and maintain iterator validity.
But I'm wondering if the type with possibly using inline storage could be thought of more like std::string (ie: define the contract: iterators invalidated on move) rather than like SmallVector (ie: "must have an inline buffer"). I don't know what we would call this thing - but my point is to try to move away from the name dictating the implementation detail of having an inline buffer (we could still have SmallVector<T, N> for when you specifically want an inline buffer) but OtherThingVector<T> would not require an inline buffer it would be "a type without iterator validity on move that has, or doesn't have, an inline buffer as it chooses to based on performance tradeoffs".
I think, to me, this would be better than "SmallVector<T> must have at least 1 inline element because of the word "Small" in the name" if we moved away from having Small in the name, and towards some sense of the contract & leaving the implementation to make the choice of how to best implement that contract.
But I think I'm going around in circles/not necessarily making sense here. Perhaps a chat on discord or something would be more effective?
The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.
Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one. Different performance characteristics, but the same contract, right? So this is working around sub-optimal standard library implementations, but not providing a different contract/requirements, is it?
On Dec 5, 2020, at 10:46 AM, David Blaikie <dbla...@gmail.com> wrote:I think I see what you’re going for here, and I agree that std::string is a slightly different situation than std::vector given that some implementations have a small string optimizations already.I feel like you’re prioritizing the iterator invalidation behavior as the core difference between the two types, whereas I’m prioritizing the performance/allocation-behavior aspect. I feel like this is a major difference in practice that API users should think about, and they should be prepared to deal with the iterator invalidation issues as necessary.Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie: must not use an inline buffer)” use case important enough to have a “string-y” type?I think things got confused again - reduced sizeof/guaranteed iterator validity isn't a string-y type (because of the iterator validity guarantee such a type can't use an inline buffer - unlike std::string), it's what we're talking about/potentially calling llvm::Vector.Ok, I thought you were talking about bringing that idea to string-y types because std::string doesn’t provide it..
As you say, we don’t have a solution for this in the current code other than std::vector. I haven’t heard of this being a significant enough problem to be worth “fixing”, and I don’t think we can drop the inline vs out-of-line distinction.
Hmm - I think I'm still miscommunicating/not making sense.
I think there's value in having a type that has a guarantee of no-inline-storage - when you want to move and maintain iterator validity.Right, in the vector domain, this is the llvm::Vector type(def).But I'm wondering if the type with possibly using inline storage could be thought of more like std::string (ie: define the contract: iterators invalidated on move) rather than like SmallVector (ie: "must have an inline buffer"). I don't know what we would call this thing - but my point is to try to move away from the name dictating the implementation detail of having an inline buffer (we could still have SmallVector<T, N> for when you specifically want an inline buffer) but OtherThingVector<T> would not require an inline buffer it would be "a type without iterator validity on move that has, or doesn't have, an inline buffer as it chooses to based on performance tradeoffs".
I think, to me, this would be better than "SmallVector<T> must have at least 1 inline element because of the word "Small" in the name" if we moved away from having Small in the name, and towards some sense of the contract & leaving the implementation to make the choice of how to best implement that contract.
But I think I'm going around in circles/not necessarily making sense here. Perhaps a chat on discord or something would be more effective?I think I understand what you’re saying: you’re focusing on making iterator invalidation semantics primary instead of memory layout.
This is a reasonable thing to do, but I don’t think we should go with a design that ignores the “in place ness”.
Maybe there is a “best of both worlds” design proposal that could make sense here?The difference I meant was that it forwards transparently to array_pod_sort / qsort which is a pretty big behavioral difference.
Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one. Different performance characteristics, but the same contract, right? So this is working around sub-optimal standard library implementations, but not providing a different contract/requirements, is it?It’s really a “similar but different” API. I works the same in the easy cases, but doesn’t have the same set of contracts, that was my only analogy.
There are lots of other APIs that are similar-but-different in innumerable ways that provide tradeoffs e.g. DenseMap vs std::map, etc. The programmer’s manual has a long list of options :-)