Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Collection.size behavior

8 views
Skip to first unread message

Benjamin

unread,
Apr 30, 2008, 8:45:51 PM4/30/08
to
Does anyone know the reason that Collection.size returns
Integer.MAX_VALUE when the the collection size is greater than that?
The reason I'm asking is because we, the Python (programming language)
developers, are considering imitating this with our sequences. It
seems to be that this is akin to silently lying and could be quite
confusing. Am I missing some practical benefit from this?

Kenneth P. Turvey

unread,
Apr 30, 2008, 8:59:23 PM4/30/08
to

This is actually a problem sometimes. I think it would be better to
return -1 if you have to return an int, or some value that indicates that
the size is unknown, but greater than Integer.MAX_VALUE.

Just my 2 cents. This will end up being a problem for Java at some
point.

--
Kenneth P. Turvey <kt-u...@squeakydolphin.com>

Arne Vajhøj

unread,
Apr 30, 2008, 9:16:52 PM4/30/08
to

The size method return an int, so it simply can not return
a larger value.

BTW, some collection types can not contain more elements. What
type are you using ?

Arne

Patricia Shanahan

unread,
Apr 30, 2008, 11:03:28 PM4/30/08
to

I think it may have been an attempt to break as little existing code as
possible while dealing with the problem that some collections might
contain more than Integer.MAX_VALUE elements. For example, any
size-based test that a collection is non-empty or contains at least N
elements still works correctly.

Patricia

Message has been deleted

Mark Space

unread,
May 1, 2008, 12:06:41 AM5/1/08
to

I'm with you -- don't lie.

I'd return a long, just to be safer, or maybe some equivalent of BigNum.

Or throw an exception: "You called the int size() routine but we have
more than int so bummer for you" sounds like a good message to me. The
exception will be the speed freaks cue that they need to call a
different size() routine.

Kenneth P. Turvey

unread,
May 1, 2008, 12:35:48 AM5/1/08
to
On Wed, 30 Apr 2008 21:06:41 -0700, Mark Space wrote:

> I'm with you -- don't lie.
>
> I'd return a long, just to be safer, or maybe some equivalent of BigNum.
>
> Or throw an exception: "You called the int size() routine but we have
> more than int so bummer for you" sounds like a good message to me. The
> exception will be the speed freaks cue that they need to call a
> different size() routine.

I like this better than my return -1 suggestion. If you must stick with
an int, and I wouldn't if you don't have to, then throw an exception.

In java something like a ReturnValueTooBigException().. :-)

Kenneth P. Turvey

unread,
May 1, 2008, 12:38:27 AM5/1/08
to
On Thu, 01 May 2008 03:43:59 +0000, Stefan Ram wrote:

[Snip]
>
> Therefore, it seems that this regulation can not break existing code.

I think Patricia was referring to breakage that might occur as the size
of a collection passes the Integer.MAX_VALUE boundary.

Patricia Shanahan

unread,
May 1, 2008, 1:03:13 AM5/1/08
to

The ReturnValueTooBigException could even have a method declared as
"long size()" that reports the actual size of the collection.

Patricia

Owen Jacobson

unread,
May 1, 2008, 1:04:39 AM5/1/08
to

In Java, if a method's signature declares that it returns a type, the
implementation cannot return an incompatible type. For reasons known
only to the Java 1.2 team, the Collections API uses 'int' as the
return type from Collection.size(), so the largest value that can
possibly be returned is Integer.MAX_VALUE.

Not so in Python: methods have no signatures as such, and (as with
smalltalk, ruby, lisp, and many other languages) arithmetic on bignums
is identical to and compatible with arithmetic on machine integers.

Report the real size, if the size is available at all; use a bignum if
you have to. Your users won't notice the difference (except perhaps
in speed, but generally size isn't called in a tight loop) and your
API will be consistent between small, in-memory collections and
massive or procedurally-generated collections.

Joshua Cranmer

unread,
May 1, 2008, 5:31:17 PM5/1/08
to
Benjamin wrote:
> Does anyone know the reason that Collection.size returns
> Integer.MAX_VALUE when the the collection size is greater than that?

1. The brunt of the API prefers 32-bit integers unless that is
incapable, and Integer.MAX_VALUE is the largest 32-bit integer Java can
represent. The other choices would have been to return a negative
number, e.g., -1, which is not intuitive, or to return an exception,
which violates the maxim of exceptions representing `exceptional'
circumstances.

> The reason I'm asking is because we, the Python (programming language)
> developers, are considering imitating this with our sequences. It
> seems to be that this is akin to silently lying and could be quite
> confusing. Am I missing some practical benefit from this?

My opinion, along with most of the others in this thread, is to not
emulate this quirk. It is Java's nature which requires the return value,
and python does not have the same structures hindering it.

That said, I would not call it "silently lying," but more an
understatement. AFAICT, the collections that Java provides (excluding
wrappers) cannot have a size greater than Integer.MAX_VALUE anyways, so
the idea was probably a future-proofing decision. In addition, I feel
that most collections code which needs the precise size would fail with
collections for which the return value is wrong anyways, so the return
value issue is more or less moot.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Arne Vajhøj

unread,
May 1, 2008, 7:52:14 PM5/1/08
to
Mark Space wrote:
> Benjamin wrote:
>> Does anyone know the reason that Collection.size returns
>> Integer.MAX_VALUE when the the collection size is greater than that?
>> The reason I'm asking is because we, the Python (programming language)
>> developers, are considering imitating this with our sequences. It
>> seems to be that this is akin to silently lying and could be quite
>> confusing. Am I missing some practical benefit from this?
>
> I'm with you -- don't lie.
>
> I'd return a long, just to be safer, or maybe some equivalent of BigNum.

But changing that would make a lot of existing code not compile.

Arne

Arne Vajhøj

unread,
May 1, 2008, 7:53:47 PM5/1/08
to
Owen Jacobson wrote:
> On Apr 30, 8:45 pm, Benjamin <musiccomposit...@gmail.com> wrote:
>> Does anyone know the reason that Collection.size returns
>> Integer.MAX_VALUE when the the collection size is greater than that?
>> The reason I'm asking is because we, the Python (programming language)
>> developers, are considering imitating this with our sequences. It
>> seems to be that this is akin to silently lying and could be quite
>> confusing. Am I missing some practical benefit from this?
>
> In Java, if a method's signature declares that it returns a type, the
> implementation cannot return an incompatible type. For reasons known
> only to the Java 1.2 team, the Collections API uses 'int' as the
> return type from Collection.size(), so the largest value that can
> possibly be returned is Integer.MAX_VALUE.

The collections that are backed by an array can not contain
more elements than what can be in an int.

It seems rather consistent (but not necessarily wise) that
all collections both array backed and other has the same
limits as arrays.

Arne

Lew

unread,
May 1, 2008, 9:09:25 PM5/1/08
to
Arne Vajhøj wrote:
> The collections that are backed by an array can not contain
> more elements than what can be in an int.
>
> It seems rather consistent (but not necessarily wise) that
> all collections both array backed and other has the same
> limits as arrays.

Those that seek to create collections with more elements that can be counted
by int will surely need to bump up the JVM's -Xmx parameter somewhat.

--
Lew

Arne Vajhøj

unread,
May 1, 2008, 10:02:59 PM5/1/08
to

Yep.

I find it difficult to believe that it is areal problem today.

But it will become a problem in maybe 10 years.

Arne

Kenneth P. Turvey

unread,
May 2, 2008, 12:22:23 AM5/2/08
to
On Thu, 01 May 2008 22:02:59 -0400, Arne Vajhøj wrote:

> Yep.
>
> I find it difficult to believe that it is areal problem today.
>
> But it will become a problem in maybe 10 years.

There has been at least one poster to this group that has had a problem
with it. I would be surprised to find that nobody was dealing with
collections with this many elements now.

In 10 years it will be a problem we have to grapple with on a regular
basis. It is a bad design. Throwing an exception would have made much
more sense.

Patricia Shanahan

unread,
May 2, 2008, 9:53:54 AM5/2/08
to

Although I have not personally encountered the problem, I was running
Java on machines with enough memory for a collection with more than
Integer.MAX_VALUE elements in 2002.

Patricia

Daniel Pitts

unread,
May 2, 2008, 5:15:53 PM5/2/08
to
He's talking about the Python implementation, so that isn't an issue.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Daniel Pitts

unread,
May 2, 2008, 5:16:41 PM5/2/08
to
Unless of course the collection is backed by a non-memory resource.

Arne Vajhøj

unread,
May 2, 2008, 5:59:28 PM5/2/08
to
Daniel Pitts wrote:

> Arne Vajhřj wrote:
>> Mark Space wrote:
>>> Benjamin wrote:
>>>> Does anyone know the reason that Collection.size returns
>>>> Integer.MAX_VALUE when the the collection size is greater than that?
>>>> The reason I'm asking is because we, the Python (programming language)
>>>> developers, are considering imitating this with our sequences. It
>>>> seems to be that this is akin to silently lying and could be quite
>>>> confusing. Am I missing some practical benefit from this?
>>>
>>> I'm with you -- don't lie.
>>>
>>> I'd return a long, just to be safer, or maybe some equivalent of BigNum.
>>
>> But changing that would make a lot of existing code not compile.
>>
> He's talking about the Python implementation, so that isn't an issue.

I don't think so.

long is Java not Python.

Arne

Arne Vajhøj

unread,
May 2, 2008, 7:31:54 PM5/2/08
to
Kenneth P. Turvey wrote:
> On Thu, 01 May 2008 22:02:59 -0400, Arne Vajhøj wrote:
>> Yep.
>>
>> I find it difficult to believe that it is areal problem today.
>>
>> But it will become a problem in maybe 10 years.
>
> There has been at least one poster to this group that has had a problem
> with it. I would be surprised to find that nobody was dealing with
> collections with this many elements now.

Somebody probably has. But not many.

> In 10 years it will be a problem we have to grapple with on a regular
> basis. It is a bad design. Throwing an exception would have made much
> more sense.

The real problem is not the return value but the return type.

Arne

Mark Space

unread,
May 4, 2008, 1:53:12 AM5/4/08
to
Patricia Shanahan wrote:

>
> Although I have not personally encountered the problem, I was running
> Java on machines with enough memory for a collection with more than
> Integer.MAX_VALUE elements in 2002.

If you count virtual memory and hard drive space, I think all of us have
had machines capable of holding Collections 2 gigabytes and larger in
size for longer than that.

However, if one has a Collection backed by, say, a database with
100,000,000,000 rows, I don't think a Collection would be the best API
to access it. So this whole discussion might be highly theoretical.

Perhaps Sun (or someone else) could come up with an API that's more
convenient for manipulating databases and is similar to Collections, if
that's desired.

Kenneth P. Turvey

unread,
May 4, 2008, 9:40:06 AM5/4/08
to
On Sat, 03 May 2008 22:53:12 -0700, Mark Space wrote:
[Snip]

> However, if one has a Collection backed by, say, a database with
> 100,000,000,000 rows, I don't think a Collection would be the best API
> to access it. So this whole discussion might be highly theoretical.
>
> Perhaps Sun (or someone else) could come up with an API that's more
> convenient for manipulating databases and is similar to Collections, if
> that's desired.
[Snip]

For many tasks that would require more than Integer.MAX_VALUE, the
Collections interface is just fine. The only reason we don't do it often
now is that our machines aren't really up to the task.

It won't be long before that simply isn't true.

Several other APIs already exist for accessing database objects. I don't
think we really need another one. You might really find a database
backed collection to be a good interface for many operations.

Lew

unread,
May 4, 2008, 10:24:32 AM5/4/08
to
Kenneth P. Turvey wrote:
> Several other APIs already exist for accessing database objects. I don't
> think we really need another one. You might really find a database
> backed collection to be a good interface for many operations.

Databases nicely combine huge key domains with (relatively) sparse key sets,
then add built-in storage and caching, remote accessibility, sophisticated
query capabilities and a soupçon of programmatic connectivity.

Even terabyte-scale stores draw their keys from domains with sagans and sagans
of possible values, so in that sense they're sparse while still being huge.
Serving that oxymoronic goal is what DBMSes do well, for varying degrees of well.

Some are even perfect for in-memory embedded work, scale permitting. Apache
Derby (Java DB) comes to mind.

These won't be a drop-in replacement for Collections, of course. They can map
to Collections readily enough, via JPA and such, but your app will necessarily
think in wider terms than mere keystore. So coding, deployment and operations
effort increases, but you do get a huge capability boost with a DBMS.

--
Lew

Arne Vajhøj

unread,
May 4, 2008, 11:26:20 AM5/4/08
to
Mark Space wrote:
> Patricia Shanahan wrote:
>> Although I have not personally encountered the problem, I was running
>> Java on machines with enough memory for a collection with more than
>> Integer.MAX_VALUE elements in 2002.
>
> If you count virtual memory and hard drive space, I think all of us have
> had machines capable of holding Collections 2 gigabytes and larger in
> size for longer than that.

Those working with servers.

> However, if one has a Collection backed by, say, a database with
> 100,000,000,000 rows, I don't think a Collection would be the best API
> to access it. So this whole discussion might be highly theoretical.

I agree. I must admit that I have never seen a non-memory backed
collection.

Arne

Kenneth P. Turvey

unread,
May 4, 2008, 11:59:16 AM5/4/08
to
On Sun, 04 May 2008 10:24:32 -0400, Lew wrote:

[Snip]


> These won't be a drop-in replacement for Collections, of course. They
> can map to Collections readily enough, via JPA and such, but your app
> will necessarily think in wider terms than mere keystore. So coding,
> deployment and operations effort increases, but you do get a huge
> capability boost with a DBMS.

[Snip]

The point of all this was that database backed Collections may have their
place. I think they do. So we already have a use case for collections
with more than Integer.MAX_VALUE entries.

Kenneth P. Turvey

unread,
May 4, 2008, 12:01:08 PM5/4/08
to
On Sun, 04 May 2008 11:26:20 -0400, Arne Vajhøj wrote:

> I agree. I must admit that I have never seen a non-memory backed
> collection.

I've thought about writing one on several occasions for various reasons.
I've never actually done it, but it does come up with regularity. I'm
sure others have actually implemented them.

Mark Space

unread,
May 4, 2008, 12:58:49 PM5/4/08
to
Kenneth P. Turvey wrote:
> On Sun, 04 May 2008 11:26:20 -0400, Arne Vajhøj wrote:
>
>> I agree. I must admit that I have never seen a non-memory backed
>> collection.
>
> I've thought about writing one on several occasions for various reasons.
> I've never actually done it, but it does come up with regularity. I'm
> sure others have actually implemented them.
>

If for no other reason than the limitation on size(), I think a
different API would be the best way to start.

Lew

unread,
May 4, 2008, 1:11:48 PM5/4/08
to

Given that such an API would almost certainly have to deal with persistent
stores, and that such collections actually are persistence abstractions, we
could call the API the Java Persistence API, and have it manifest a library to
interface an object-oriented, collection-based model to an arbitrary
persistence engine. We should make it annotation- or descriptor-file-based at
the architect's will, and it can then use more-or-less POJO types to represent
the entities to be collected. This whole JPA layer could abstract the mapping
between the backing store and the object model. In ideal world such a thing
would already be standardized,
<http://java.sun.com/javaee/5/docs/tutorial/doc/bnbpy.html>
and have at least two solid, free implementations,
<http://www.hibernate.org/>
<http://openjpa.apache.org/>
<https://glassfish.dev.java.net/downloads/persistence/JavaPersistence.html>
and perhaps work well with, even be included as part of existing application
servers and frameworks.
<https://glassfish.dev.java.net/>
<http://www.springframework.org/>
<http://www-306.ibm.com/software/webservers/appserv/was/>

--
Lew

Mark Space

unread,
May 4, 2008, 7:41:25 PM5/4/08
to
Lew wrote:

Wow that Kenneth guy is a fast worker! ;)

Arne Vajhøj

unread,
May 4, 2008, 10:00:01 PM5/4/08
to
Kenneth P. Turvey wrote:
> On Sun, 04 May 2008 11:26:20 -0400, Arne Vajhøj wrote:
>> I agree. I must admit that I have never seen a non-memory backed
>> collection.
>
> I've thought about writing one on several occasions for various reasons.
> I've never actually done it, but it does come up with regularity. I'm
> sure others have actually implemented them.

I can not google one either.

The closest is:

http://java.sun.com/j2se/1.5.0/docs/guide/collections/designfaq.html#5

Arne

Kenneth P. Turvey

unread,
May 5, 2008, 1:21:35 AM5/5/08
to
On Sun, 04 May 2008 16:41:25 -0700, Mark Space wrote:

> Wow that Kenneth guy is a fast worker! ;)

Wait.. I was the guy that said we didn't need another API!

Roedy Green

unread,
May 5, 2008, 6:46:36 AM5/5/08
to
On Wed, 30 Apr 2008 17:45:51 -0700 (PDT), Benjamin
<musiccom...@gmail.com> wrote, quoted or indirectly quoted
someone who said :

>Does anyone know the reason that Collection.size returns
>Integer.MAX_VALUE when the the collection size is greater than that?

I suppose the when Collection was written, the authors were
unconsciously thinking in terms of 32-bit JVMs so had size return int
rather than long. It would have been impossible to have a ram-based
collection with more elements than that.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Arne Vajhøj

unread,
May 5, 2008, 10:43:24 AM5/5/08
to
Roedy Green wrote:
> On Wed, 30 Apr 2008 17:45:51 -0700 (PDT), Benjamin
> <musiccom...@gmail.com> wrote, quoted or indirectly quoted
> someone who said :
>> Does anyone know the reason that Collection.size returns
>> Integer.MAX_VALUE when the the collection size is greater than that?
>
> I suppose the when Collection was written, the authors were
> unconsciously thinking in terms of 32-bit JVMs so had size return int
> rather than long. It would have been impossible to have a ram-based
> collection with more elements than that.

It is still impossible to have an array backed collection with
more elements on a 64 bit JVM.

Arne

Kenneth P. Turvey

unread,
May 5, 2008, 11:58:36 AM5/5/08
to
On Mon, 05 May 2008 10:43:24 -0400, Arne Vajhøj wrote:

>> I suppose the when Collection was written, the authors were
>> unconsciously thinking in terms of 32-bit JVMs so had size return int
>> rather than long. It would have been impossible to have a ram-based
>> collection with more elements than that.
>
> It is still impossible to have an array backed collection with more
> elements on a 64 bit JVM.

True enough, but I'm not sure why Roedy brings this up anyway. There is
no reason to believe that a Collection must be array backed (or for that
matter, backed by a single array).

Arne Vajhøj

unread,
May 5, 2008, 5:58:38 PM5/5/08
to

Some are.

And having some collections being able to contain more elements
than other could be said to expose implementation.

It could be argued that the List interface should contain
a note that says max. 2G elements, because then the interface
would be consistent.

Arne

Roedy Green

unread,
May 10, 2008, 9:26:13 AM5/10/08
to
On Mon, 05 May 2008 10:43:24 -0400, Arne Vajhøj <ar...@vajhoej.dk>

wrote, quoted or indirectly quoted someone who said :

>It is still impossible to have an array backed collection with


>more elements on a 64 bit JVM.

Collections might be backed on disk or using an array of arrays which
could in theory break the Integer.MAX_VALUE limit.

Collection.toArray implies an Integer.MAX_VALUE limit on collection
size.

At some point Java will acquire 64-bit indexed arrays. It will be
interesting to see how they stitch them in.

0 new messages