STL list<>.size()

Fray Bentos

unread,

Jul 16, 2001, 1:03:11 PM7/16/01

to

Do all STL implementations calculate the size() is a list<> via :

distance() call ...

int s=0;
while(iterator i=begin(); i!=end(); i++,)
s++;

-----

The STL with gcc 3.0 does this, is there a reason it ?
Sure a list with a number_of_members is preferable ?

I knwo Im probably missing somthing here but it really bies for code
such as :

int func (int i)
{
mylist.push_back(i);
return mylist.size()-1;
}

for(int i=0; i<1000; i++)
func(i);

O(n)

Thanks

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]

Matthew Austern

unread,

Jul 16, 2001, 1:38:14 PM7/16/01

to

"Fray Bentos" <frayb...@bigfoot.com> writes:

> Do all STL implementations calculate the size() is a list<> via :
>
> distance() call ...

No. Some implementations compute the size each time,
others maintain a member variable that contains the
size.

> The STL with gcc 3.0 does this, is there a reason it ?
> Sure a list with a number_of_members is preferable ?

Depends on what you're trying to optimize. Maintaining
a member variable that contains the size makes size()
faster, but it increases the size of a list<> object
and it makes other operations (such as some forms of
list<>::splice) slower. It's a tradeoff either way.

The argument that finally convinced me it was better
not to have such a member variable is that a user who
wants to keep track of a list's size can just do it
by hand; there's nothing the list can do internally
that a user can't do just as well. If a list<>
implementation does have that extra member variable,
however, there's no way for a user who doesn't need
that extra overhead to avoid it.

Daniel Frey

unread,

Jul 16, 2001, 2:42:46 PM7/16/01

to

Matthew Austern wrote:
>
> "Fray Bentos" <frayb...@bigfoot.com> writes:
>
> > Do all STL implementations calculate the size() is a list<> via :
> >
> > distance() call ...
>
> No. Some implementations compute the size each time,
> others maintain a member variable that contains the
> size.

IIRC the standard says that size() is an O(1)-function, right?

Regards, Daniel

--
Daniel Frey

aixigo AG - financial training, research and technology
Schloß-Rahe-Straße 15, 52072 Aachen, Germany
fon: +49 (0)241 936737-42, fax: +49 (0)241 936737-99
eMail: danie...@aixigo.de, web: http://www.aixigo.de

Howard Hinnant

unread,

Jul 16, 2001, 3:10:13 PM7/16/01

to

In article <dil3d7w...@isolde.research.att.com>, Matthew Austern
<aus...@research.att.com> wrote:

| No. Some implementations compute the size each time,
| others maintain a member variable that contains the
| size.

The standardeze that allows this is under 23.1/5: size() has a
non-normative note:

| Notes: Those entries marked ``(Note A)'' should have constant complexity.

Not only is this statement non-normative but "should" doesn't mean
"must".

| > The STL with gcc 3.0 does this, is there a reason it ?
| > Sure a list with a number_of_members is preferable ?
|
| Depends on what you're trying to optimize. Maintaining
| a member variable that contains the size makes size()
| faster, but it increases the size of a list<> object
| and it makes other operations (such as some forms of
| list<>::splice) slower. It's a tradeoff either way.
|
| The argument that finally convinced me it was better
| not to have such a member variable is that a user who
| wants to keep track of a list's size can just do it
| by hand; there's nothing the list can do internally
| that a user can't do just as well. If a list<>
| implementation does have that extra member variable,
| however, there's no way for a user who doesn't need
| that extra overhead to avoid it.

Just for diversity, I hold the opposite view of Matt. :-)

The only operation I know of that an O(1) size makes slower is /one/
form of splice. Overall, there are 5 variations on splice implemented
by 3 overloaded methods. Here is their complexity assuming an O(1)
size:

All list::methods besides splice(some elements, from other) can update
a size member variable in constant time; or in some methods that have a
linear complexity of K*size(), updating size() may take k*size() time
where k is much much smaller than K. That is, updating size may be
linear, but is in the noise level with respect to the work that the
method has to do anyway.

Furthermore, there are several list methods that can take advantage of
an O(1) size to /increase/ their performance:

* Obviously with an O(1) size, size() itself is faster.

* When a list::method must iterate to an indexed element in the list,
an O(1) size can be used to quickly find out whether it would be
quicker to iterate forwards from begin(), or backwards from end().
This technique can be used to speed up *operator=*, *resize*, and all
forms of *assign* except the variation taking input iterators (if the
iterators are not just input, then assign(it, it) can be sped up with
this technique too).

* The operators == and != are more efficient with an O(1) size since
you can check for equal size()'s before you start the element by
element compare.

* For implementing a traditional merge sort (member sort method of
list), it is easier (faster?) to cut the list in half when you have an
O(1) size.

Yes, an O(1) size is going to cost an extra word of memory. But the
Metrowerks std::list has an O(1) size, and the overhead is only 3
words. Most list implementations have an overhead greater than this,
even if they have an O(N) size. One word overhead on the stack plus
padded(2*sizeof(void*)+sizeof(T)) bytes of overhead on the heap is
common. And I've seen bigger than that.

One could get the overhead of list down to 2 words with an O(N) size.
But because of the adverse performance impact on several list methods,
while benefitting just one method, I believe that the O(N) size is not
a good deal from a technical standpoint.

Additionally the standard "encourages" implementors to have an O(1)
size:

* 23.1/5: size() /should/ be O(1)
* 23.2.2.4/14: Complexity (of splice(some)): Constant time if &x ==
this; otherwise, linear time
* Overall design that eliminates "inefficient" operations such as
push_front on vector, or operator[] on list.

Combine the performance analysis, with the "encouragement" of the
standard, and I believe that size() should be O(1).

--
Howard Hinnant
Metrowerks

Matthew Austern

unread,

Jul 16, 2001, 3:25:13 PM7/16/01

to

Daniel Frey <danie...@aixigo.de> writes:

> Matthew Austern wrote:
> >
> > "Fray Bentos" <frayb...@bigfoot.com> writes:
> >
> > > Do all STL implementations calculate the size() is a list<> via :
> > >
> > > distance() call ...
> >
> > No. Some implementations compute the size each time,
> > others maintain a member variable that contains the
> > size.
>
> IIRC the standard says that size() is an O(1)-function, right?

Not quite. It says that size() "should" be O(1), not "shall".
The official meaning of "should" is exactly the same as the
meaning in ordinary English: the standard says that it's a
good idea for it to be O(1), but it does not say that an
implementation where size() is O(N) is nonconforming.

This may seem like a delicate distinction, but it was quite
deliberate.

Fray Bentos

unread,

Jul 17, 2001, 4:31:34 PM7/17/01

to

Thanks to all of you that replied.
I had my suspicions that this would be the case.

Using the same sample code as b4

int func (int i)
{
mylist.push_back(i);
return mylist.size()-1;
}

for(int i=0; i<1000; i++)
func(i);

it is actually faster to use a vector<> and suffer the reallocation
evertime !

vector<>.size() therefore O(const) whereas list<>.size() has O(n).

One more question OT, is this likely to be standardised ?

Later

John Michael Davison

unread,

Jul 17, 2001, 6:38:45 PM7/17/01

to

Fray Bentos ("frayb...@bigfoot.com") writes:

>Do all STL implementations calculate the size() is a list<> via [a theta(n)
>call]?

No, but they certainly have the right to. If they are to conform to
ISO/IEC 14882:1998, size() must be theta(1).

From URL "http://www.stlport.org/resources/StepanovUSA.html", Alexander
Stepanov writes:

>size() used to be linear time in case of STL lists. It was the right decision
>since if a user wants to keep a count it is easy to do, but usually you do not
>need it and it makes splice linear time (it used to be constant). But the
>standard committee insisted that I change it to constant time. I had no
>choice. We will have to change this requirement eventually.

As you can see from URL "http://www.sgi.com/tech/stl/stl_list.h",
Stepanov et al have taken the matter into their own hands.

ISO/IEC 14882:1998 (Section 23.1, Table 65, Note A) says that a
Container's size() method complexity must be constant time. ISO/IEC 14882:1998
(Section 23.2.2.4, line 14) says that the complexity of splice is linear time.

The STL permits a wider range of implementations than ISO/IEC
14882:1998. From "http://www.sgi.com/tech/stl/List.html":

"[list<T>::size()] Returns the size of the list. Note: you
should not assume that this function is constant time. It
is permitted to be O(N), where N is the number of elements
in the list. If you wish to test whether a list is empty,
you should write L.empty() rather than L.size() == 0.

The documentation for the STL-specific "slist" template says the same
thing.

I know nothing about the current goings-on in the ISO committee, so I
can't offer a meaningful opinion on exactly how this is going to end up. The
best I can say is that Stepanov has his reasons, articulated in the above
interview. If you want your code to have similar performance characteristics
across various C++ environments, you must provide for the possibility that your
list class will have a slow size() operation _or_ a slow splice() operation.
The workaround may involve writing your own custom template class.

Matvei Brodski

unread,

Jul 18, 2001, 12:52:06 PM7/18/01

to

John Michael Davison wrote:

> Fray Bentos ("frayb...@bigfoot.com") writes:
>
> >Do all STL implementations calculate the size() is a list<> via [a theta(n)
> >call]?
>
> No, but they certainly have the right to. If they are to conform to
> ISO/IEC 14882:1998, size() must be theta(1).

[snip]

> ISO/IEC 14882:1998 (Section 23.1, Table 65, Note A) says that a
> Container's size() method complexity must be constant time.

Excuse me, it does not say that. Quote: "entries marked '(Note A)'
*should* have constant complexity". This is rather different from
"must be constant time".

The word "should" is used rather infrequenctly in the Standard to assume
it is to be understood as "must". Consider 15.2[3]: "... destructors should
generally catch exceptions and not let them propagate out..."

Matvei.

Rani Sharoni

unread,

Jul 24, 2001, 1:18:46 PM7/24/01

to

Hello All:
I recently faced the some code that I can't compile with gcc 3.0
(online) and vc7 (beta 2) but I can with bcc 5.5
I compiled the following code:

struct A1 {};
struct A2 {};

struct B1 : A1 {};
struct B2 : A1 {};
struct B3 : A2 {};

struct C : B1, B2, B3 {};

void foo(const A1&);
void foo(condt A2&);

void foo_func(const C &c) { foo(c); }

gcc and vc complained about ambiguity.
Bcc choose the foo(const A1&) function.

I noticed that the standard says (4 – 3 and 13.3 – 3):

13.3 -- 3
--Then the best viable function is selected based on the implicit conversion
sequences (_over.best.ics_) needed to match ...

4 -- 3
An expression e can be implicitly converted to a type T if and only if the
declaration ”T t=e;" is well-formed, ...

So it seems that the borland compiler is right because the expression "const
A1 &t=c" is ill-formed making void foo(const A1&) not viable, but I'm not
sure.

Can anyone say if the code is legal?

Eugene Karpachov

unread,

Jul 25, 2001, 12:41:21 AM7/25/01

to

Tue, 24 Jul 2001 17:18:46 GMT Rani Sharoni написал:

> struct A1 {};
> struct A2 {};
>
> struct B1 : A1 {};
> struct B2 : A1 {};
> struct B3 : A2 {};
>
> struct C : B1, B2, B3 {};
>
> void foo(const A1&);
> void foo(condt A2&);
>
> void foo_func(const C &c) { foo(c); }
>
> gcc and vc complained about ambiguity.

They are right.

> Bcc choose the foo(const A1&) function.

For which "const A1&"? Was it
void foo_func(const C &c) { foo( static_cast<const B1 &>(c) ); }

or

void foo_func(const C &c) { foo( static_cast<const B2 &>(c) ); }?

>So it seems that the borland compiler is right because the expression "const

^^^^^
Do you mean "wrong"?

>A1 &t=c" is ill-formed making void foo(const A1&) not viable, but I'm not
>sure.

--
jk

Christopher Eltschka

unread,

Jul 25, 2001, 5:05:56 PM7/25/01

to

Howard Hinnant <hin...@antispam.metrowerks.com> writes:

Let me add a third possibility:

Store the size, but allow it to be marked invalid, maybe through the
value size_type(-1). Now all O(1) operation update the size if it is
valid, all O(n) operations update the size in any case, and the
"offending splice" invalidates the size. If the size has been
invalidated (this can happen only by calling the "offending splice"),
size() is O(N), otherwise it's O(1). Users who care about size()
always being O(1) can just call size() directly after each offending
splice. The pair offending splice+size would be O(N), and all "normal"
calls to size() would be O(1). Users who care about fast splice would
not do that call, and therefore get O(1) splice in every case.

Gawain Bolton

unread,

Jul 28, 2001, 3:32:33 PM7/28/01

to

Christopher Eltschka wrote:

Yes but to mark the size as "invalid" you may need to another member variable
and developpers out there seem to be very frugal when it comes to having a
member variable for the container size...

Unless, of course, you designate a value of 0 to indicate the size may not be
known. Calculating the size again if it really is zero would be negligible.

What disappoints me most with this is that the C++ Standard Library is not
intuitive. Why should the size be available in constant time for some
containers and not for others? The size() function is far too important not to
be optimized.

Saying that users can keep track of size not the issue. The problem is, it's
not obvious that this may need to be done.

I thought the philosophy of the standard library containers was to provide
functions which were efficient. The idea being you are forced to do the
inefficient things in your code and so these are exposed rather then being
hidden in a standard library function.

For this reason, either:

1) The performance of size() should be worded "shall be O(1) (constant time)"
for all containers except perhaps after calling "offending" splice() function
(see above) in which case it may be O(n) (linear time).

or

2) The size() function should be removed from std::list's interface and users
forced to calculate the size by iterating through the list.

It truly saddens me that so much time is wasted with details like this. It's
amazing everyone can't agree on this!

Gawain

John Michael Davison

unread,

Jul 29, 2001, 7:41:12 AM7/29/01

to

I wrote:

> "...[STL implementations] certainly have the right to [calculate the size()
> is a list<> via [a theta(n) call]. If they are to conform to ISO/IEC

> 14882:1998, size() must be theta(1).

I was wrong: Howard Hinnant, Matvei Brodski, and Matthew Austern are correct.
The standard does indeed use the word "should," not "shall" or "must."

Christopher Eltschka

unread,

Jul 30, 2001, 4:43:35 PM7/30/01

to

Gawain Bolton <gbo...@wanadoo.fr> writes:

No, see above. "Unknown" has the value size_type(-1).

>
> Unless, of course, you designate a value of 0 to indicate the size may not be
> known. Calculating the size again if it really is zero would be negligible.

That's of course another option. But with size_type(-1), this check
isn't even necessary.

>
> What disappoints me most with this is that the C++ Standard Library is not
> intuitive. Why should the size be available in constant time for some
> containers and not for others? The size() function is far too important not to
> be optimized.

Because of the logic of the container?

What is the value of splice() if it is not O(1)? After all, you get
the same behaviour with ordinary copy/destroy.

>
> Saying that users can keep track of size not the issue. The problem is, it's
> not obvious that this may need to be done.

OK, there's a point in not definintg size() for std::list at all (just
as forward iterators don't define operator+ due to bad
efficiency). Then one would have to use distance(l.begin(), l.end()),
which is obviously slow.

>
> I thought the philosophy of the standard library containers was to provide
> functions which were efficient. The idea being you are forced to do the
> inefficient things in your code and so these are exposed rather then being
> hidden in a standard library function.

The problem with list:size is that there are two functions which
cannot be made efficient at the same time. One of them is size, and
the other is a certain case of splice. You can't have your cake and
eat it, too.

However, the "almost always O(1)" would IMHO be a sufficient solution.

>
> For this reason, either:
>
> 1) The performance of size() should be worded "shall be O(1) (constant time)"
> for all containers except perhaps after calling "offending" splice() function
> (see above) in which case it may be O(n) (linear time).
>
> or
>
> 2) The size() function should be removed from std::list's interface and users
> forced to calculate the size by iterating through the list.
>
> It truly saddens me that so much time is wasted with details like this. It's
> amazing everyone can't agree on this!

Now that we have a standard, and we have a size member, 2 is not
really an option.

Indeed, I have the impression that the set of containers is not as
well designed as the set of iterators (possibly because the notion of
them being merely examples). For example, we have iterator_traits and
iterator_category, but we don't have a similar mechanism for
containers. We have the fast member/slow general function pairs for
iterators, but generally not for containers (and in the cases where we
have such members - like sort -, there is no algorithm which can use
them when appropriate, and use the general algorithm otherwise).

Howard Hinnant

unread,

Jul 31, 2001, 7:13:48 AM7/31/01

to

In article <t3d76it...@watts.itp.tuwien.ac.at>, Christopher
Eltschka <celt...@web.de> wrote:

| What is the value of splice() if it is not O(1)? After all, you get
| the same behaviour with ordinary copy/destroy.

I continually see the adverse performance effects of O(1) size
exaggerated for splice. I believe this is yet another case.

When you have O(1) size, splice(iterator position, list& x, iterator
first, iterator last) is distinctly not the same behavior (or
performance) as a copy/destroy algorithm. splice still /moves/ nodes.
It *does* *not* create new nodes and destroy old ones (assuming equal
allocators here) which /would/ be much more expensive. The only extra
thing this variation of splice must do is:

size_type delta = std::distance(first, last);

That is, it must count the number of increments it takes to get from
first to last. That's it! The constant on this O(some) operation is
/very/ small compared to copying and destroying list elements.

I am somewhat amazed at the interest this one line of code generates.
:-)

--
Howard Hinnant
Metrowerks

P.J. Plauger

unread,

Jul 31, 2001, 7:14:12 AM7/31/01

to

"Christopher Eltschka" <celt...@web.de> wrote in message news:t3d76it...@watts.itp.tuwien.ac.at...

> What is the value of splice() if it is not O(1)? After all, you get
> the same behaviour with ordinary copy/destroy.

Time complexity isn't the whole story. Copy/destroy requires just that,
one copy construction and one destruction per element moved. Not to
mention a probable node allocation and freeing, plus a lot of link
restitching per element along the way. Counting nodes by chaining along
links, OTOH, is rather inexpensive.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Rani Sharoni

unread,

Aug 12, 2001, 9:26:48 AM8/12/01

to

Sorry but I made a mistake, Bcc choose "void foo(const A2&)".
So can anyone tell if "void foo(const A1&)" is a viable function?

Thanks,
Rani

j...@steel.orel.ru (Eugene Karpachov) wrote in message news:<slrn9lsh...@localhost.localdomain>...

> Tue, 24 Jul 2001 17:18:46 GMT Rani Sharoni написал:
> > struct A1 {};
> > struct A2 {};
> >
> > struct B1 : A1 {};
> > struct B2 : A1 {};
> > struct B3 : A2 {};
> >
> > struct C : B1, B2, B3 {};
> >
> > void foo(const A1&);
> > void foo(condt A2&);
> >

> > void foo func(const C &c) { foo(c); }

> >
> > gcc and vc complained about ambiguity.
>
> They are right.
>
> > Bcc choose the foo(const A1&) function.

correction: Bcc choose the foo(const A2&) function.

>
> For which "const A1&"? Was it

> void foo func(const C &c) { foo( static cast<const B1 &>(c) ); }
>
> or
>
> void foo func(const C &c) { foo( static cast<const B2 &>(c) ); }?

>
>
> >So it seems that the borland compiler is right because the expression "c
> onst
> ^^^^^
> Do you mean "wrong"?
>
> >A1 &t=c" is ill-formed making void foo(const A1&) not viable, but I'm
> not
> >sure.

---

Francis Glassborow

unread,

Aug 12, 2001, 11:26:48 PM8/12/01

to

In article <f1395bf8.01081...@posting.google.com>, Rani
Sharoni <sadp...@yahoo.com> writes

>Sorry but I made a mistake, Bcc choose "void foo(const A2&)".
>So can anyone tell if "void foo(const A1&)" is a viable function?
>
>Thanks,
>Rani
>

>j...@steel.orel.ru (Eugene Karpachov) wrote in message news:<slrn9lshih.g9.jk@loca
>lhost.localdomain>...
>> Tue, 24 Jul 2001 17:18:46 GMT Rani Sharoni ÎÁÐÉÓÁÌ:

>> > struct A1 {};
>> > struct A2 {};
>> >
>> > struct B1 : A1 {};
>> > struct B2 : A1 {};
>> > struct B3 : A2 {};
>> >
>> > struct C : B1, B2, B3 {};
>> >
>> > void foo(const A1&);
>> > void foo(condt A2&);
>> >
>> > void foo func(const C &c) { foo(c); }
>> >
>> > gcc and vc complained about ambiguity.

Any compiler that makes a choice is wrong. the parameter func has
equally viable conversions to A1 and A2 so how do you think it should
choose? By tossing a coin? Now you have a second ambiguity in the case
of A1, because C has two A1 subobjects, but the existence of that does
not mean that the compiler should choose the lesser ambiguity and go for
A2.

Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Mark Williams

unread,

Aug 13, 2001, 6:14:38 PM8/13/01

to

Francis Glassborow <francis.g...@ntlworld.com> wrote in message news:<EppEWRBXJqd7Ew$P...@ntlworld.com>...

> >> > struct A1 {};
> >> > struct A2 {};
> >> >
> >> > struct B1 : A1 {};
> >> > struct B2 : A1 {};
> >> > struct B3 : A2 {};
> >> >
> >> > struct C : B1, B2, B3 {};
> >> >
> >> > void foo(const A1&);
> >> > void foo(condt A2&);
> >> >
> >> > void foo func(const C &c) { foo(c); }
> >> >
> >> > gcc and vc complained about ambiguity.
>
> Any compiler that makes a choice is wrong. the parameter func has
> equally viable conversions to A1 and A2 so how do you think it should
> choose? By tossing a coin?

No. By the rules of the standard, as stated by the OP:

>> 13.3 -- 3
>> --Then the best viable function is selected based on the implicit
conversion
>> sequences (_over.best.ics_) needed to match ...

>> 4 -- 3
>> An expression e can be implicitly converted to a type T if and only
if the

>> declaration ?T t=e;" is well-formed, ...

Given an expression c of type C, the declaration "A1 t = c;" is not
well formed, so there is no implicit conversion from C to A1, so
foo(const A1 &) is not a viable function, so the compiler *must*
choose foo(const A2 &).

> Now you have a second ambiguity in the case
> of A1, because C has two A1 subobjects, but the existence of that does
> not mean that the compiler should choose the lesser ambiguity and go for
> A2.

Which seems to be in direct contradiction of the standard. Can you
support your position?

Mark Williams

Francis Glassborow

unread,

Aug 13, 2001, 8:47:08 PM8/13/01

to

In article <10f784d5.01081...@posting.google.com>, Mark
Williams <mar...@my-deja.com> writes

>Given an expression c of type C, the declaration "A1 t = c;" is not
>well formed, so there is no implicit conversion from C to A1, so
>foo(const A1 &) is not a viable function, so the compiler *must*
>choose foo(const A2 &).
>
>> Now you have a second ambiguity in the case
>> of A1, because C has two A1 subobjects, but the existence of that does
>> not mean that the compiler should choose the lesser ambiguity and go for
>> A2.
>
>Which seems to be in direct contradiction of the standard. Can you
>support your position?

13.3.3.1 para 10 refers to ambiguous conversion sequences.

If 4 para 3 applied then we would not be able to have ambiguous
conversion sequences because in all such cases the specified declaration
would be ill-formed.

I think this is a defect in the Standard, that should be corrected by
finding wording to make the case we are discussing ambiguous. While I
understand the interpretation of the OP, I find it hard to believe that
we would actually want code to be have that way.

Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---