Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Size of an arraylist in bytes

1,947 views
Skip to first unread message

sara

unread,
Nov 20, 2011, 4:01:44 PM11/20/11
to
Hi All,

I create an Arraylist<Integer> tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

Best
Sara

markspace

unread,
Nov 20, 2011, 4:05:56 PM11/20/11
to
We'd have to see some code to give you a good answer, but basically you
can't measure the memory size of Java objects. They change over time,
in ways that C or C++ can't or doesn't, and there's not much to do that
can rectify that.


sara

unread,
Nov 20, 2011, 4:11:00 PM11/20/11
to
Here is the code:

ArrayList<Integer> tmp=new ArrayList<Integer>();
tmp.add(-1);
tmp.add(-1);
System.out.println(DiGraph.GetBytes(tmp).length);
tmp.set(0, 10);
System.out.println(DiGraph.GetBytes(tmp).length);


public static byte[] GetBytes(Object v) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try {
oos = new ObjectOutputStream(bos);
oos.writeObject(v);
oos.flush();
oos.close();
bos.close();
} catch (IOException e) {
e.printStackTrace();
}
byte[] data = bos.toByteArray();
return data;
}

The problem is I need to write multiple arraylists on disk and later
on I update the elements of them. I store the starting location of
arraylists and their size such that later I can refer to them. If the
size of objects change then it messes up! Could you please help?

Eric Sosman

unread,
Nov 20, 2011, 4:30:41 PM11/20/11
to
See markspace's response. Another possible point of confusion:
The ArrayList does not actually contain objects, but references to
those objects -- that's why the same object instance can be in three
ArrayLists, two Sets, and a Map simultaneously. In fact, the same
Integer object could appear forty-two times in a single ArrayList:

List<Integer> list = new ArrayList<Integer>();
Integer number = Integer.valueOf(42);
for (int i = 0; i < 42; ++i)
list.add(number);

If you're coming from a C background, a rough analogy is that
the ArrayList holds "pointers" to the objects it holds, not copies
of those objects.

--
Eric Sosman
eso...@ieee-dot-org.invalid

sara

unread,
Nov 20, 2011, 4:35:30 PM11/20/11
to
> esos...@ieee-dot-org.invalid

But do you have any answer to my second question?
Message has been deleted

Andreas Leitgeb

unread,
Nov 20, 2011, 4:58:19 PM11/20/11
to
sara <saras...@gmail.com> wrote:
> Here is the code:
> ArrayList<Integer> tmp=new ArrayList<Integer>();
> tmp.add(-1);
> tmp.add(-1);
> System.out.println(DiGraph.GetBytes(tmp).length);
> tmp.set(0, 10);
> System.out.println(DiGraph.GetBytes(tmp).length);
>
> public static byte[] GetBytes(Object v) {
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> ObjectOutputStream oos;
> try {
> oos = new ObjectOutputStream(bos);
> oos.writeObject(v);

The serialization output size of an ArrayList<Integer> depends on
more than just the number of Integer elements in the array. There
is the capacity, which may be larger than the size, but what really
spoils it for you is the Integer-objects, which get serialized along
with the array. If both are same, only one Integer-object gets saved,
but if you change the value for one, then you get two different
Integer-objects serialized along with the actual array, and thus
you get more bytes.

If you need fixed-size records for your arrays (assuming a fixed
size() ), you might be more lucky with arrays of primitives:

If you had:
int[] = new int[2]; tmp[0]=-1; tmp[1]=-1;
and dump that array onto oos, then change tmp[0]=0;
it's very likely, you'll see the same number of bytes
dumped, afterwards.

Eric Sosman

unread,
Nov 20, 2011, 4:58:33 PM11/20/11
to
On 11/20/2011 4:44 PM, Stefan Ram wrote:
> Eric Sosman<eso...@ieee-dot-org.invalid> writes:
>> If you're coming from a C background, a rough analogy is that
>> the ArrayList holds "pointers" to the objects it holds, not copies
>> of those objects.
>
> An ArrayList /does/ hold pointers (in the sense of Java),
> this is not just »a rough analogy«:
>
> »(...) reference values (...) are pointers«

They're "pointers" in Java's terms, but Java is considerably
more restrictive about what you can do with a "pointer" than C is.
You cannot, for example, print the value of a Java reference; you
can do so in C. You cannot convert a Java reference to or from an
integer; C allows it (with traps for the unwary). Java references
obey a type hierarchy; C's types (and hence the pointers to them)
are unrelated. And so on, and so on: Little niggly differences.
Since Java's references support (and prohibit) a different set of
operations than C's pointers do, I maintain they're as similar as
dogs and wolves, and as different.

Put it this way: If I had told sara "An ArrayList contains
C-style pointers to the objects it holds," would I have been
telling the truth?

--
Eric Sosman
eso...@ieee-dot-org.invalid

Patricia Shanahan

unread,
Nov 20, 2011, 5:04:47 PM11/20/11
to
On 11/20/2011 1:58 PM, Eric Sosman wrote:
...
> Put it this way: If I had told sara "An ArrayList contains
> C-style pointers to the objects it holds," would I have been
> telling the truth?
>

No, but if you had said "An ArrayList contains pointers to the objects
it holds." you would have been telling the exact truth.

The baggage that C added to pointers was an unfortunate aberration, not
something that should ever be considered to be the default definition of
"pointer".

Patricia

markspace

unread,
Nov 20, 2011, 5:08:03 PM11/20/11
to
On 11/20/2011 1:11 PM, sara wrote:

> The problem is I need to write multiple arraylists on disk and later
> on I update the elements of them. I store the starting location of
> arraylists and their size such that later I can refer to them. If the
> size of objects change then it messes up! Could you please help?


Yes, this is the problem. You have to use something different from an
ArrayList, because the ArrayList does change size.

Look into plain arrays, IntBuffer, DataInputStream and DataOutputStream.

It would also help now if we knew why you want to store multiple
ArraysLists on disk. What is it you are trying to do?

Arne Vajhøj

unread,
Nov 20, 2011, 5:18:25 PM11/20/11
to
On 11/20/2011 5:04 PM, Patricia Shanahan wrote:
> On 11/20/2011 1:58 PM, Eric Sosman wrote:
> ...
>> Put it this way: If I had told sara "An ArrayList contains
>> C-style pointers to the objects it holds," would I have been
>> telling the truth?
>>
>
> No, but if you had said "An ArrayList contains pointers to the objects
> it holds." you would have been telling the exact truth.

Yes.

> The baggage that C added to pointers was an unfortunate aberration, not
> something that should ever be considered to be the default definition of
> "pointer".

C/C++ pointers has certainly caused a lot of problems over the
years.

But the languages would not have been the same without them. And
I even doubt that they would have been as popular.

C and C++ was not chosen because alternatives without
"do anything you want pointers" did not exist.

Arne

Eric Sosman

unread,
Nov 20, 2011, 5:19:22 PM11/20/11
to
On 11/20/2011 4:35 PM, sara wrote:
>[...]
> But do you have any answer to my second question?

Only that you're going about it wrong. As Andreas Leitgeb points
out, serializing an object is a different proposition than serializing
a bunch of "raw" values: It saves enough information to reconstruct an
"image" of the original object, with the same structure.

What do I mean by "structure?" Something like this:

Integer x = new Integer(42);
Integer y = new Integer(42);

Here we have two distinct Integer instances, each with the value 42.

ArrayList<Integer> one = new ArrayList<Integer>();
one.add(x);
one.add(x);

The first ArrayList holds one of the Integer instances, twice, and
has nothing to do with the other.

ArrayList<Integer> two = new ArrayList<Integer>();
two.add(x);
two.add(y);

The second ArrayList holds both Integer instances, once each.

If you serialize `one' and read it back again, you'll get an
ArrayList with two references to the same Integer. Reading it back
will produce one Integer, not two. There will be two objects in
the serialized stream: One ArrayList and one Integer, plus enough
additional information to reassemble them. (Actually, there will
probably be additional objects: The ArrayList owns an array, which
is an object in its own right, and perhaps there might be others.
But there'll be two "visible" objects in the stream.)

If you serialize `two' and read it back, you'll get an ArrayList
with two references to two distinct Integers: Three "visible" objects
in all.

It's all right to serialize an object graph and store it on disk.
It is *not* all right to try to update the serialization in place,
nor to modify the object and expect a re-serialization to have the
same size. If you need in-place operations or same-size guarantees,
you'll need to invent a different external representation for your data.

--
Eric Sosman
eso...@ieee-dot-org.invalid
Message has been deleted

Patricia Shanahan

unread,
Nov 20, 2011, 5:48:12 PM11/20/11
to
On 11/20/2011 2:42 PM, Stefan Ram wrote:
> Arne Vajhøj<ar...@vajhoej.dk> writes:
>> C/C++ pointers has certainly caused a lot of problems over the
>> years.
>
> C serves as a »portable, abstract machine language«, so the
> C pointers are inherited machine addresses from machine
> languages, where one can freely add machine addresses and
> numbers. But, after all, C already adds some type safety and
> abstraction. So, C still makes sense as the first layer on
> top of the bare metal. And C cannot be blamed for someone
> choosing C where it is not appropriate.

My main concern with C's pointers is that they were called "pointers",
not "addresses". They behave far more like assembly language addresses
than like something more abstract, whose only job is to point.

Patricia

Patricia Shanahan

unread,
Nov 20, 2011, 5:50:26 PM11/20/11
to
On 11/20/2011 1:11 PM, sara wrote:
...
> The problem is I need to write multiple arraylists on disk and later
> on I update the elements of them. I store the starting location of
> arraylists and their size such that later I can refer to them. If the
> size of objects change then it messes up! Could you please help?
...

I think we need to go up a level. Serializing ArrayList is not the
answer if you need fixed space. What is the question? What objective are
you trying to achieve?

Patricia

Arne Vajhøj

unread,
Nov 20, 2011, 6:06:22 PM11/20/11
to
On 11/20/2011 4:11 PM, sara wrote:
> Here is the code:
>
> ArrayList<Integer> tmp=new ArrayList<Integer>();
> tmp.add(-1);
> tmp.add(-1);
> System.out.println(DiGraph.GetBytes(tmp).length);
> tmp.set(0, 10);
> System.out.println(DiGraph.GetBytes(tmp).length);
>
>
> public static byte[] GetBytes(Object v) {
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> ObjectOutputStream oos;
> try {
> oos = new ObjectOutputStream(bos);
> oos.writeObject(v);
> oos.flush();
> oos.close();
> bos.close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> byte[] data = bos.toByteArray();
> return data;
> }

That code measure the size of ArrayList serialized.

It does not reflect how much it take up in memory.

And you should not user serialization for persistent
storage.

Arne

Message has been deleted

Lew

unread,
Nov 20, 2011, 11:28:16 PM11/20/11
to
On Sunday, November 20, 2011 1:11:00 PM UTC-8, sara wrote:
> Here is the code:
>
> ArrayList<Integer> tmp=new ArrayList<Integer>();

*DO NOT USE TAB CHARACTERS TO INDENT USENET CODE LISTINGS!*

> tmp.add(-1);
> tmp.add(-1);
> System.out.println(DiGraph.GetBytes(tmp).length);
> tmp.set(0, 10);
> System.out.println(DiGraph.GetBytes(tmp).length);
>
>
> public static byte[] GetBytes(Object v) {
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> ObjectOutputStream oos;
> try {
> oos = new ObjectOutputStream(bos);
> oos.writeObject(v);
> oos.flush();
> oos.close();
> bos.close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> byte[] data = bos.toByteArray();
> return data;
> }
>
> The problem is I need to write multiple arraylists on disk and later

The problem is that the code you posted won't compile.

> on I update the elements of them. I store the starting location of
> arraylists and their size such that later I can refer to them. If the
> size of objects change then it messes up! Could you please help?

Java changes the sizes of things in surprising ways, and makes no promises about the size of an 'ArrayList' in the way you're asking.

What do you really want to do?

> On Nov 20, 1:05 pm, markspace <-@.> wrote:

*DO NOT TOP-POST!*

--
Lew

Lew

unread,
Nov 20, 2011, 11:44:30 PM11/20/11
to
Eric Sosman wrote:
> Stefan Ram wrote:
>> Eric Sosman writes:
>>> If you're coming from a C background, a rough analogy is that
>>> the ArrayList holds "pointers" to the objects it holds, not copies
>>> of those objects.
>>
>> An ArrayList /does/ hold pointers (in the sense of Java),
>> this is not just »a rough analogy«:
>>
>> »(...) reference values (...) are pointers«
>
> They're "pointers" in Java's terms, but Java is considerably

They're "pointers" in programming terms, not just Java's.

> more restrictive about what you can do with a "pointer" than C is.

So?

> You cannot, for example, print the value of a Java reference; you
> can do so in C. You cannot convert a Java reference to or from an
> integer; C allows it (with traps for the unwary). Java references
> obey a type hierarchy; C's types (and hence the pointers to them)
> are unrelated. And so on, and so on: Little niggly differences.
> Since Java's references support (and prohibit) a different set of
> operations than C's pointers do, I maintain they're as similar as
> dogs and wolves, and as different.

Dogs and wolves are the same species. They can interbreed.

Java pointers *are* pointers - and that's all they are. They don't pretend to do arithmetic on themselves. That does not make them less a pointer.

The essence of pointers is that they point. The implicit 'const' on them (in C terms) doesn't change that a jot.

> Put it this way: If I had told sara "An ArrayList contains
> C-style pointers to the objects it holds," would I have been
> telling the truth?

Why would you say such a bone-headed thing, and what difference does it make? A pointer is a pointer still, if it but points, though you cannot increment it.

No one is claiming that they're "C-style" pointers. so we'll throw that red herring back in the water.

--
Lew

Roedy Green

unread,
Nov 21, 2011, 1:25:35 AM11/21/11
to
On Sun, 20 Nov 2011 13:01:44 -0800 (PST), sara <saras...@gmail.com>
wrote, quoted or indirectly quoted someone who said :
What code did you use to convert to byte[]?

An ArrayList consists of a base ArrayList object, a array of pointers
object, and one object for each integer. If the integers are small,
e.g. two 1s in the list will point to the same canonical Integer
object.

Each object (including all the Integers) has perhaps 8 to 16 bytes of
overhead. So it is fairly complicated to figure out how much RAM this
thing uses. It is not like a C array where you just multiply 4xslots.

An int[] is much simpler.

--
Roedy Green Canadian Mind Products
http://mindprod.com
I can't come to bed just yet. Somebody is wrong on the Internet.
Message has been deleted

Arne Vajhøj

unread,
Nov 25, 2011, 10:11:18 PM11/25/11
to
On 11/21/2011 1:25 AM, Roedy Green wrote:
> On Sun, 20 Nov 2011 13:01:44 -0800 (PST), sara<saras...@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> I create an Arraylist<Integer> tmp and add some integers to it.
>> Afterward, I measure the size of tmp in bytes (by converting tmp to
>> bytes array). Assume the result is byte[] C. However, when I update an
>> element of tmp, and measure size of tmp in bytes again, the result is
>> different than C!
>> Why this is the case?
>
> What code did you use to convert to byte[]?

The code was posted in a followup.

Arne

Arne Vajhøj

unread,
Nov 25, 2011, 10:12:23 PM11/25/11
to
Since C does not have both constructs, then it is pure terminology.

Arne


Arne Vajhøj

unread,
Nov 25, 2011, 10:16:52 PM11/25/11
to
On 11/20/2011 11:44 PM, Lew wrote:
> Eric Sosman wrote:
>> Stefan Ram wrote:
>>> Eric Sosman writes:
>>>> If you're coming from a C background, a rough analogy is that
>>>> the ArrayList holds "pointers" to the objects it holds, not copies
>>>> of those objects.
>>>
>>> An ArrayList /does/ hold pointers (in the sense of Java),
>>> this is not just »a rough analogy«:
>>>
>>> »(...) reference values (...) are pointers«

>> You cannot, for example, print the value of a Java reference; you
>> can do so in C. You cannot convert a Java reference to or from an
>> integer; C allows it (with traps for the unwary). Java references
>> obey a type hierarchy; C's types (and hence the pointers to them)
>> are unrelated. And so on, and so on: Little niggly differences.
>> Since Java's references support (and prohibit) a different set of
>> operations than C's pointers do, I maintain they're as similar as
>> dogs and wolves, and as different.
>
> Dogs and wolves are the same species. They can interbreed.
>
> Java pointers *are* pointers - and that's all they are. They don't pretend to do arithmetic on themselves. That does not make them less a pointer.
>
> The essence of pointers is that they point. The implicit 'const' on them (in C terms) doesn't change that a jot.
>
>> Put it this way: If I had told sara "An ArrayList contains
>> C-style pointers to the objects it holds," would I have been
>> telling the truth?
>
> Why would you say such a bone-headed thing, and what difference does it make? A pointer is a pointer still, if it but points, though you cannot increment it.
>
> No one is claiming that they're "C-style" pointers. so we'll throw that red herring back in the water.

What is the difference between pointers for someone from a C background
and C style pointers?

Arne


Lew

unread,
Nov 25, 2011, 11:15:18 PM11/25/11
to
Arne Vajhøj wrote:
> What is the difference between pointers for someone from a C background
> and C style pointers?

"Pointers" are things defined generally for computer science and don't really depend on one's background any more than does the definition of "photon". "C-style pointer" is a colloquial way to express the connotative package of assumptions about the attributes and behaviors of pointers made by those with a background in "C", or really any non-Java language. One of Java's innovations (or errors, depending on your outlook) was the removal of most of the attributes and behaviors those with a C background tend to associate with pointers. Only the core notion that they point to the location of an object remained, pretty much. No more arithmetic, no more wild pointing into spaces beyond allocated memory, no more aliasing pointers to different types. The notion of pointer was bound much more tightly to the underlying type in Java than in C. In keeping, somewhat, with C/C++ usage and more general computer programming terminology, Java primarily uses the term "reference", which I suppose carries more connotations of fixed target and tightly-bound type. Still, they took pains to note in the JLS that references are pointers.

So, summary: "pointers" are what they are regardless of one's background, "C-style pointers" are pointers implemented with the attributes and behaviors that pointers possess in C.

--
Lew

0 new messages