std::vector get a pointer to end of raw data

1 069 zobrazení
Přeskočit na první nepřečtenou zprávu

David Hunter

nepřečteno,
6. 4. 2014 21:23:2106.04.14
komu: std-pr...@isocpp.org
C++11 added vector::data to get a raw pointer to the first element in the vector. Which is great as now you don't have to write &vec.front( ) or similar.

So how about a pointer to the end of the data, or one past the end? Well it looks like we're stuck with &vec.back( ) + 1, or vec.data( ) + vec.size( ), or ....

So give vector::front, vector::back and vector::begin( ), vector::end( ) why only a vector::data. Maybe vector::begin_data( ) and vector::end_data() would have been more consistent. Anyway whatever it's called a single method to get either a pointer to the last element or a one past the last element would be nice.

I am motivated here by a number of C mathematical functions I am calling that take start and one past the end pointer to input data.

David

Thiago Macieira

nepřečteno,
7. 4. 2014 0:01:3607.04.14
komu: std-pr...@isocpp.org
Em dom 06 abr 2014, às 18:23:21, David Hunter escreveu:
> C++11 added vector::data to get a raw pointer to the first element in the
> vector. Which is great as now you don't have to write &vec.front( ) or
> similar.
>
> So how about a pointer to the end of the data, or one past the end? Well it
> looks like we're stuck with &vec.back( ) + 1, or vec.data( ) + vec.size( ),
> or ....

While it would be convenient to have such a method, vec.data() + vec.size() is
the answer you're looking for.

There are two possible implementations of such an end_data() method, depending
on what the vector implementation stores internally:

- if it stores begin + size pointers, the implementation would be:
T *end_data() const { return data() + size(); }
which is what your code is doing

- if it stores begin and end pointers, the implementation would be:
T *end_data() const { return m_end_data; }
size_t size() const { return m_end_data - m_begin_data; }
which means your code of vec.begin() + vec.size() gets optimised to
m_end_data anyway

So this function would be convenience, but no optimisation.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

David Rodríguez Ibeas

nepřečteno,
7. 4. 2014 12:04:2807.04.14
komu: std-pr...@isocpp.org
There is no performance advantage either in providing 'data()' from doing 'vector.empty()? 0 : &vector.front()', or 'vector.empty()? 0 : &*vector.begin()'.

The question, I understood, is more of a style than an implementation. Of course in this case the difference in style is lesser (since the conditional is already handled inside 'data()' and need not be replicated, so it might not be worth it, but it should not only be considered in terms of performance.



--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Greg Marr

nepřečteno,
7. 4. 2014 14:29:5907.04.14
komu: std-pr...@isocpp.org
On Monday, April 7, 2014 12:04:28 PM UTC-4, David Rodríguez Ibeas wrote:
There is no performance advantage either in providing 'data()' from doing 'vector.empty()? 0 : &vector.front()', or 'vector.empty()? 0 : &*vector.begin()'.

Yes there is.  It eliminates a check to see if the vector is empty.
 
The question, I understood, is more of a style than an implementation. Of course in this case the difference in style is lesser (since the conditional is already handled inside 'data()' and need not be replicated, so it might not be worth it, but it should not only be considered in terms of performance.

There is no conditional in std::vector::data().  It simply returns the pointer to the internal array storage.

David Rodríguez Ibeas

nepřečteno,
7. 4. 2014 14:40:1307.04.14
komu: std-pr...@isocpp.org
Fine, but that is still missing the point. There are proposed changes and changes that went into the standard that are just stylistic. Consider a different case: 

vector<T>::cbegin() vs. casting to 'const vector<T>&' and calling 'begin()'

No performance there. Same comment as in previous message: this is a different level of improvement in style, 'data()+size()' is not as ugly as the cast above.


David Hunter

nepřečteno,
7. 4. 2014 15:46:2707.04.14
komu: std-pr...@isocpp.org, dib...@ieee.org
Just to clarify my main issue was more obvious user code rather than efficiency. I'm sure a good compiler would optimize vec.data( ) + vec.end( ) to be the same as vec.end_data( ). I guess it depends on the implementation of vector, as mentioned by Tiago earlier, which as a user of vector I don't care about. I suspect a vec::end_data would never end up being less efficient than vec.data( ) + vec.end( ) or friends.

If we don't care about simpler user code why add vec.data()? &vec.front() or equivalents work just as well. Given the following

c_func( &vec.front( ), &vec.end( ) + 1 );
c_func( vec.data( ), vec.data() + vec.size( ) );
c_func( vec.begin_data( ), vec.end_data( ) );

or maybe, the following if we follow the begin/end free functions to allow any data type that can produce a raw begin and end pointer.

c_func( begin_data( stuff ), end_data( stuff ) );

I find the latter more obvious.I implement the free functions above in my own code and they seem to work nicely.




Nevin Liber

nepřečteno,
7. 4. 2014 15:57:4207.04.14
komu: std-pr...@isocpp.org
On 7 April 2014 14:46, David Hunter <davidh...@gmail.com> wrote:
Just to clarify my main issue was more obvious user code rather than efficiency. I'm sure a good compiler would optimize vec.data( ) + vec.end( ) to be the same as vec.end_data( ). I guess it depends on the implementation of vector, as mentioned by Tiago earlier, which as a user of vector I don't care about. I suspect a vec::end_data would never end up being less efficient than vec.data( ) + vec.end( ) or friends.

If we don't care about simpler user code why add vec.data()? &vec.front() or equivalents work just as well. Given the following

c_func( &vec.front( ), &vec.end( ) + 1 );

You probably meant &vec.back() + 1.  Unfortunately, that expression depends on the precondition !vec.empty().  Much prefer:
 
c_func( vec.data( ), vec.data() + vec.size( ) );

As mentioned earlier, this is the safe way to do it which doesn't depend on a precondition.
 
I don't see half-open ranges showing up all that much in C.  And it will lead to endless confusion in something like std::string, as it will not include the trailing '\0'.  I'd be neutral  to weakly against it.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Thiago Macieira

nepřečteno,
7. 4. 2014 16:33:5007.04.14
komu: std-pr...@isocpp.org
Em seg 07 abr 2014, às 11:29:59, Greg Marr escreveu:
> On Monday, April 7, 2014 12:04:28 PM UTC-4, David Rodríguez Ibeas wrote:
> > There is no performance advantage either in providing 'data()' from doing
> > 'vector.empty()? 0 : &vector.front()', or 'vector.empty()? 0 :
> > &*vector.begin()'.
>
> Yes there is. It eliminates a check to see if the vector is empty.

The check is not necessary because:

> There is no conditional in std::vector::data(). It simply returns the
> pointer to the internal array storage.

end = begin + size
size = 0
∴ end == begin

and when you have a range of [begin, end) with begin == end, the range is
empty, no matter what the value of begin is. You don't need to perform any
checks.

Thiago Macieira

nepřečteno,
7. 4. 2014 16:41:3907.04.14
komu: std-pr...@isocpp.org
Em seg 07 abr 2014, às 14:57:42, Nevin Liber escreveu:
> I don't see half-open ranges showing up all that much in C. And it will
> lead to endless confusion in something like std::string, as it will not
> include the trailing '\0'. I'd be neutral to weakly against it.

True, but very often the API is derived from C++ needs and it ends up using
half-open ranges as parameters.

For example, for QString (a vector of QChar), there are just as many internal
functions taking begin & size as there are functions taking begin & end. It
often boils down to who wrote the code, what was most optimal at the time of
writing, and what how much refactoring the code has gone through.

And FYI, neither QString, nor QByteArray, nor QVector provide an "end_data"
pointer API. We either do data() + size(), or we "abuse" the knowledge the
iterator for those three classes is just a pointer, so begin() and end() serve
the same purpose.

QString, QByteArray, QVector::data have been available since Qt 3. It's
extremely useful to tell QByteArray to allocate N bytes, pass the data pointer
to a C function for filling in, then returning the array.

Greg Marr

nepřečteno,
7. 4. 2014 17:26:3907.04.14
komu: std-pr...@isocpp.org
On Monday, April 7, 2014 2:40:13 PM UTC-4, David Rodríguez Ibeas wrote:
Fine, but that is still missing the point.

No, it's not.

There are proposed changes and changes that went into the standard that are just stylistic.

The data() function is not stylistic.  It provides a capability that you could not have before, which is to get a pointer to the internal array without first checking to see if the vector is empty.

Consider a different case: 

vector<T>::cbegin() vs. casting to 'const vector<T>&' and calling 'begin()'
No performance there. Same comment as in previous message: this is a different level of improvement in style, 'data()+size()' is not as ugly as the cast above.

That is an extremely different level.  There is really no comparison between adding v.cbegin() to replace (static_cast<vector<std::vector<std::wstring>> &>(v)).begin(); and adding v.end_data() to replace v.data() + v.size()

Greg Marr

nepřečteno,
7. 4. 2014 17:35:4307.04.14
komu: std-pr...@isocpp.org
On Monday, April 7, 2014 4:33:50 PM UTC-4, Thiago Macieira wrote:
Em seg 07 abr 2014, às 11:29:59, Greg Marr escreveu:
> On Monday, April 7, 2014 12:04:28 PM UTC-4, David Rodríguez Ibeas wrote:
> > There is no performance advantage either in providing 'data()' from doing
> > 'vector.empty()? 0 : &vector.front()', or 'vector.empty()? 0 :
> > &*vector.begin()'.
>
> Yes there is.  It eliminates a check to see if the vector is empty.

The check is not necessary because:

> There is no conditional in std::vector::data().  It simply returns the
> pointer to the internal array storage.

end = begin + size
size = 0
∴ end == begin

and when you have a range of [begin, end) with begin == end, the range is
empty, no matter what the value of begin is. You don't need to perform any
checks.

Yes, exactly, data() is a performance advantage because it eliminates an unnecessary test for the vector being empty when working with either half empty ranges or start plus size, and that's why adding it was more than just stylistic.

Greg Marr

nepřečteno,
7. 4. 2014 17:38:4307.04.14
komu: std-pr...@isocpp.org
On Monday, April 7, 2014 5:26:39 PM UTC-4, Greg Marr wrote:
Consider a different case: 

vector<T>::cbegin() vs. casting to 'const vector<T>&' and calling 'begin()'
No performance there. Same comment as in previous message: this is a different level of improvement in style, 'data()+size()' is not as ugly as the cast above.

That is an extremely different level.  There is really no comparison between adding v.cbegin() to replace (static_cast<vector<std::vector<std::wstring>> &>(v)).begin(); and adding v.end_data() to replace v.data() + v.size()

especially since I forgot the "const" in the cast, and so in the end, didn't even end up getting the same thing as cbegin(), and didn't notice until my third or fourth time reading the code.  :)

Thiago Macieira

nepřečteno,
7. 4. 2014 19:01:2707.04.14
komu: std-pr...@isocpp.org
Em seg 07 abr 2014, às 14:35:43, Greg Marr escreveu:
> > The check is not necessary because:
> >
> > > There is no conditional in std::vector::data(). It simply returns the
> > > pointer to the internal array storage.
> >
> > end = begin + size
> > size = 0
> > ∴ end == begin
> >
> > and when you have a range of [begin, end) with begin == end, the range is
> > empty, no matter what the value of begin is. You don't need to perform
> > any
> > checks.
>
> Yes, exactly, data() is a performance advantage because it eliminates an
> unnecessary test for the vector being empty when working with either half
> empty ranges or start plus size, and that's why adding it was more than
> just stylistic.

Sorry, I think I had misunderstood you.

Yes, data() is a performance advantage, since you can't call front() or
*begin() on an empty container.

My point is that end_data() is not a performance advantage. It's just
syntactic sugar.

David Hunter

nepřečteno,
7. 4. 2014 20:53:5807.04.14
komu: std-pr...@isocpp.org
Actually the comment about performance with a empty vector made me look at the standard. I can see the following

23.3.6.4 vector data [vector.data]
T* data() noexcept;
const T* data() const noexcept;
1
Returns: A pointer such that [data(),data() + size()) is a valid range. For a non-empty vector,
data() == &front().
2
Complexity: Constant time.


So my question is what does this mean for an empty vector.

I am wondering if there is any guarantee that the data function never returns nullptr. My understanding of the standard is that adding 0 to a nullptr is valid and you get a nullptr.
So [data(),data() + size()) would be two nullptrs. I did a seach for "valid range" in the standard and it doesn't seem to be defined. Maybe the "A pointer" implies a non nullptr but I couldn't find anywhere in the standard where it says "A pointer" must not be a nullptr.

David

Thiago Macieira

nepřečteno,
7. 4. 2014 21:11:4207.04.14
komu: std-pr...@isocpp.org
Em seg 07 abr 2014, às 17:53:58, David Hunter escreveu:
> Actually the comment about performance with a empty vector made me look at
> the standard. I can see the following
>
> *23.3.6.4 vector data [vector.data]T* data() noexcept;const T* data() const
> noexcept;1Returns: A pointer such that [data(),data() + size()) is a valid
> range. For a non-empty vector,data() == &front().2Complexity: Constant
> time.*
>
> So my question is what does this mean for an empty vector.
>
> I am wondering if there is any guarantee that the data function never
> returns nullptr. My understanding of the standard is that adding 0 to a
> nullptr is valid and you get a nullptr.

There's no guarantee that I can see, either. It can return anything it wants
if the size is zero. The range [x, x) is empty for all x. You can't do *x in
that case, just like you can't do *end().

I would recommend that implementations not return null there, just so
applications don't get in the habit of checking data() == nullptr to test if
the array is empty. At the same time, they should return an address that will
cause a segmentation fault if dereferenced, to catch applications that try to
dereference everything.

For example: it can return (T*) alignof(T).

> *S*o [data(),data() + size()) would be two nullptrs. I did a seach for
> "valid range" in the standard and it doesn't seem to be defined. Maybe the
> "A pointer" implies a non nullptr but I couldn't find anywhere in the
> standard where it says "A pointer" must not be a nullptr.

I don't see why it would have to be non-null.

Greg Marr

nepřečteno,
7. 4. 2014 21:17:4107.04.14
komu: std-pr...@isocpp.org
On Monday, April 7, 2014 7:01:27 PM UTC-4, Thiago Macieira wrote:
Sorry, I think I had misunderstood you.

Yes, data() is a performance advantage, since you can't call front() or
*begin() on an empty container.

My point is that end_data() is not a performance advantage. It's just
syntactic sugar.

Yes, we are in agreement on both points.

Greg Marr

nepřečteno,
7. 4. 2014 21:22:4007.04.14
komu: std-pr...@isocpp.org
On Monday, April 7, 2014 8:53:58 PM UTC-4, David Hunter wrote:
Actually the comment about performance with a empty vector made me look at the standard.
 
So my question is what does this mean for an empty vector.

In general, if the vector has always been empty, it will return nullptr, but if the vector is empty because it had data and has been resized to be empty, it will return the pointer to the allocated array.  In neither case are you allowed to dereference the pointer, because the size is zero.

I am wondering if there is any guarantee that the data function never returns nullptr.

It is perfectly valid to return nullptr, so there is no such guarantee.
 
My understanding of the standard is that adding 0 to a nullptr is valid and you get a nullptr.
So [data(),data() + size()) would be two nullptrs.

That's correct, and this call will be perfectly valid on that range:

for_each(data(), data() + size(), [](auto x) { cout << x; });

Farid Mehrabi

nepřečteno,
9. 4. 2014 9:52:4509.04.14
komu: std-pr...@isocpp.org
Range is a general concept represented by a pair of similar-type iterators as its bounds; Validity of  a range implies the validity of its bounds, but not just that; a valid range must not be empty. in case of using pointers as iterators, validity if iterators means none-null pointers, since validity of a pointer **does mean** the validity of what it points to, and nullptr is pointing to somewhere that is not supposed to contain anything. So i don`t see any reason to emphasize that nullptr  is invalid.

By this observation, we can see that if 'size()' equals to 0,then [data(),data() + size()) is an invalid range regardless of validity of 'data().So my understanding is that, although an implementation might be allowed to return a none-null pointer as its 'data()' when the vector is empty, validity of the range can be checked via comparing its bounds which is implicitly achieved in any 'for' based iterative algorithm.

regards,
FM.


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.



--
how am I supposed to end the twisted road of  your hair in the dark night,
unless the candle of your face does not turn a lamp on up my way?

Nevin Liber

nepřečteno,
9. 4. 2014 10:33:5809.04.14
komu: std-pr...@isocpp.org
On 9 April 2014 08:52, Farid Mehrabi <farid....@gmail.com> wrote:
 in case of using pointers as iterators, validity if iterators means none-null pointers,

No, it doesn't.
 
since validity of a pointer **does mean** the validity of what it points to,

No it doesn't.  Valid pointers are not necessarily dereferenceable.  This is true for vector.data() + vector.size(), nullptr and a pointer just past the end of an array.  The last two (other than the spelling of nullptr) have been true since the C90 standard.
 
By this observation, we can see that if 'size()' equals to 0,then [data(),data() + size()) is an invalid range regardless of validity of 'data().

It's a perfectly valid empty range.  That is the point of half-open ranges: it gives us a way to represent empty ranges without having to specifically check for them.

Olaf van der Spek

nepřečteno,
11. 4. 2014 7:33:4811.04.14
komu: std-pr...@isocpp.org
 What about writing C++ wrappers that take containers or something like array_view?
Odpovědět všem
Odpověď autorovi
Přeposlat
0 nových zpráv