Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

StringBuffer & String and the shared paradigm... WHY?

1 view

Skip to first unread message

Johan Compagner

unread,

Oct 16, 2000, 3:00:00 AM10/16/00

Hi,

I don't know how many of you guys really knows how String and StringBuffer are working together (through the toString()) but it goes
as this:

When you want a String out of StringBuffer: StringBuffer.toString() is called that one calls the String(StringBuffer) constructor of
String.
There the Char array of the StringBuffer is set as the Char array of string offset = 0, count = stringbuffer.length()
This means that you can have a (worse case) 100% to big char array!
For example the default size of StringBuffer char array = 16 now you add 17 chars to it: then the char array is 16*2+1 = 33.
And then you do toString() so the String of that 17 chars has its 17 chars in a 33 char big array! That's about 100% too much!
So let's say that the average is the waste if using StringBuffer that still means that of all the Strings that are created with a
StringBuffer has an average of 50% waste!

After that there are 2 choices: The StringBuffer is used again, or the StringBuffer is discarded:

Used again:
Then the StringBuffer's array is copied into a new as big as the current one
So you have suddenly 2 big arrays! So in the above example you add 1 char and call to string again: Now you have a second string: 18
chars in a 33 chars array.
So when a StringBuffer is used again this approach is horrible must better is the way of when you call toString() the String makes a
array exactly the size needed and copies the used portion of the array of the StringBuffer into that. After that the StringBuffer
can still use it's own char array and the String has a waste of 0%.

So when using a stringbuffer after calling toString() the shared approach is just wrong.
But if you don't use it again has the shared approach advantages?
The String is still with an average of 50% to big! The one advantage is see is that you don't have to make a new array and then copy
the section.
But does that justify an average of 50% to large array's in all the strings that are made by a StringBuffer (so also the one of str
+= "test";)??

Happily there is a choice when you are working with a StringBuffer:
String str = stringBuffer.substring(0,stringBuffer.length());
This copies the array into a new one and the stringbuffer keeps the current.

Do I miss something?

Johan Compagner
J-COM

Bernie Lofaso

unread,

Oct 16, 2000, 3:00:00 AM10/16/00

Johan Compagner wrote:

>
> So when a StringBuffer is used again this approach is horrible must better is the way of when you call toString() the String makes a
> array exactly the size needed and copies the used portion of the array of the StringBuffer into that. After that the StringBuffer
> can still use it's own char array and the String has a waste of 0%.
>
> So when using a stringbuffer after calling toString() the shared approach is just wrong.
> But if you don't use it again has the shared approach advantages?

The advantage occurs in that you don't have to allocate new storage and perform the copy. This advantage, of course, is only realized
if you don't reuse the StringBuffer.

>
> The String is still with an average of 50% to big! The one advantage is see is that you don't have to make a new array and then copy
> the section.
> But does that justify an average of 50% to large array's in all the strings that are made by a StringBuffer (so also the one of str
> += "test";)??
>
> Happily there is a choice when you are working with a StringBuffer:
> String str = stringBuffer.substring(0,stringBuffer.length());
> This copies the array into a new one and the stringbuffer keeps the current.
>
> Do I miss something?

No, I think you understand pretty well. Like you, I was dismayed at this aspect of using StringBuffer's. I had a program which
allocated a rather large StringBuffer with the intent that it would be sufficiently large for my needs and never reallocate memory.
When I had poor performance and high memory usage, I found out that each toString() call I made on the StringBuffer was duplicating the
large buffer rather than copying the string. I reported this to Sun and it was assigned a bug number, but in review they claimed that
it was "a feature". Their response was that StringBuffer's should not be reused. To me, this seems totally bogus. The implied, if
not stated, purpose of StringBuffer's is to be reused. Fortunately, you've found the work-around which is the same one that I used.
Glad to know I'm not the only person who found this behavior mystifying.

Bernie Lofaso

Dirk Bosmans

unread,

Oct 17, 2000, 12:43:17 AM10/17/00

I'm reacting to following parts of "Johan Compagner" <jo...@wxs.nl>'s article in
comp.lang.java.programmer on Mon, 16 Oct 2000 09:51:44 +02006

> Hi,
>
> I don't know how many of you guys really knows how String and StringBuffer are working together (through the toString()) but it goes
> as this:
>
> When you want a String out of StringBuffer: StringBuffer.toString() is called that one calls the String(StringBuffer) constructor of
> String.
> There the Char array of the StringBuffer is set as the Char array of string offset = 0, count = stringbuffer.length()
> This means that you can have a (worse case) 100% to big char array!
> For example the default size of StringBuffer char array = 16 now you add 17 chars to it: then the char array is 16*2+1 = 33.
> And then you do toString() so the String of that 17 chars has its 17 chars in a 33 char big array! That's about 100% too much!
> So let's say that the average is the waste if using StringBuffer that still means that of all the Strings that are created with a
> StringBuffer has an average of 50% waste!
>
> After that there are 2 choices: The StringBuffer is used again, or the StringBuffer is discarded:
>
> Used again:
> Then the StringBuffer's array is copied into a new as big as the current one
> So you have suddenly 2 big arrays! So in the above example you add 1 char and call to string again: Now you have a second string: 18
> chars in a 33 chars array.

> So when a StringBuffer is used again this approach is horrible must better is the way of when you call toString() the String makes a
> array exactly the size needed and copies the used portion of the array of the StringBuffer into that. After that the StringBuffer
> can still use it's own char array and the String has a waste of 0%.
>
> So when using a stringbuffer after calling toString() the shared approach is just wrong.
> But if you don't use it again has the shared approach advantages?

> The String is still with an average of 50% to big! The one advantage is see is that you don't have to make a new array and then copy
> the section.
> But does that justify an average of 50% to large array's in all the strings that are made by a StringBuffer (so also the one of str
> += "test";)??
>
> Happily there is a choice when you are working with a StringBuffer:
> String str = stringBuffer.substring(0,stringBuffer.length());
> This copies the array into a new one and the stringbuffer keeps the current.
>
> Do I miss something?
>

> Johan Compagner
> J-COM
>
>

There is also a similar memory danger with String.subString() in some cases.

I had a program that was reading lots of (sometimes large) text files into one
String each, and then take a substring of a few words, keeping no reference to
the original file-in-a-string. After some limit tests, memory problems came in
production. It appeared that a String.subString() keeps the original char[]
reference, and just sets its offset and length to some substring within that
char[] (which saves memory in most common cases, I guess). Solution was the
sequence
String myWords = new String(fileInAString.subString(i, j));
which allocates a new char[] and copies only the necessary chars into it.

Greetings,
Dirk Bosmans

http://users.belgacombusiness.net/arci/
- Applicet Framework: turns Applets into Applications
- ArciMath BigDecimal: now with BigDecimalFormat

Jon Skeet

unread,

Oct 17, 2000, 3:00:00 AM10/17/00

Dirk.B...@tijd.com wrote:
> There is also a similar memory danger with String.subString() in some cases.
>
> I had a program that was reading lots of (sometimes large) text files into one
> String each, and then take a substring of a few words, keeping no reference to
> the original file-in-a-string. After some limit tests, memory problems came in
> production. It appeared that a String.subString() keeps the original char[]
> reference, and just sets its offset and length to some substring within that
> char[] (which saves memory in most common cases, I guess). Solution was the
> sequence
> String myWords = new String(fileInAString.subString(i, j));
> which allocates a new char[] and copies only the necessary chars into it.

This is also useful when reading a file in a line at a time - by default,
I believe it allocates an 80 character buffer, which may be much too
large if you're reading in (for example) a "word-per-line" dictionary.
Creating a new string from each returned string again solves the problem.
(Patricia originally alerted me to that one.)

--
Jon Skeet - sk...@pobox.com
http://www.pobox.com/~skeet/

Dale King

unread,

Oct 17, 2000, 3:00:00 AM10/17/00

Jon Skeet wrote:

>
> Dirk.B...@tijd.com wrote:
> > There is also a similar memory danger with String.subString() in some cases.
> >
> > I had a program that was reading lots of (sometimes large) text files into one
> > String each, and then take a substring of a few words, keeping no reference to
> > the original file-in-a-string. After some limit tests, memory problems came in
> > production. It appeared that a String.subString() keeps the original char[]
> > reference, and just sets its offset and length to some substring within that
> > char[] (which saves memory in most common cases, I guess). Solution was the
> > sequence
> > String myWords = new String(fileInAString.subString(i, j));
> > which allocates a new char[] and copies only the necessary chars into it.
>

> This is also useful when reading a file in a line at a time - by default,
> I believe it allocates an 80 character buffer, which may be much too
> large if you're reading in (for example) a "word-per-line" dictionary.
> Creating a new string from each returned string again solves the problem.
> (Patricia originally alerted me to that one.)

Why don't you guys submit some requests for workarounds for these? Be
sure to frame it such that the current default behavior is not changed,
but provisions are added to request the alternate behavior. The default
behavior does make sense in some situations. Your current workaround is
not great because it still requires creating a String object that gets
discarded.

For the case of substring they could add an overloaded version that adds
a boolean paramter to specify if the data is to be shared. Or another
suggestion would be to add a constructor that took a String, int offset,
and int length which had the same effect. Another solution might be to
have a separate method for String that forces it to create a new copy of
its char array if it is currently only using part of it.

Similar additions could be made for the sharing between StringBuffer and
String. I thought the String constructor that takes a StringBuffer
argument would do this based on the docs, but the docs are misleading.

--
--- Dale King

Dirk Bosmans

unread,

Oct 18, 2000, 12:29:30 AM10/18/00

I'm reacting to following parts of Dale King <Ki...@TCE.com>'s article in
comp.lang.java.programmer on Tue, 17 Oct 2000 11:43:12 -05006

> Jon Skeet wrote:
> >
> > Dirk.B...@tijd.com wrote:

> > > There is also a similar memory danger with String.subString() in some cases.
> > >
> > > I had a program that was reading lots of (sometimes large) text files into one
> > > String each, and then take a substring of a few words, keeping no reference to
> > > the original file-in-a-string. After some limit tests, memory problems came in
> > > production. It appeared that a String.subString() keeps the original char[]
> > > reference, and just sets its offset and length to some substring within that
> > > char[] (which saves memory in most common cases, I guess). Solution was the
> > > sequence
> > > String myWords = new String(fileInAString.subString(i, j));
> > > which allocates a new char[] and copies only the necessary chars into it.
> >

> > This is also useful when reading a file in a line at a time - by default,
> > I believe it allocates an 80 character buffer, which may be much too
> > large if you're reading in (for example) a "word-per-line" dictionary.
> > Creating a new string from each returned string again solves the problem.
> > (Patricia originally alerted me to that one.)
>
> Why don't you guys submit some requests for workarounds for these? Be
> sure to frame it such that the current default behavior is not changed,
> but provisions are added to request the alternate behavior. The default
> behavior does make sense in some situations. Your current workaround is
> not great because it still requires creating a String object that gets
> discarded.
>
> For the case of substring they could add an overloaded version that adds
> a boolean paramter to specify if the data is to be shared. Or another
> suggestion would be to add a constructor that took a String, int offset,
> and int length which had the same effect. Another solution might be to
> have a separate method for String that forces it to create a new copy of
> its char array if it is currently only using part of it.
>
> Similar additions could be made for the sharing between StringBuffer and
> String. I thought the String constructor that takes a StringBuffer
> argument would do this based on the docs, but the docs are misleading.

I agree, a new String(String src, int offset, int length) to use instead of
substring() would be fine, but a String.trimToSize() would violate immutability.
But still, aren't these ugly details that should be hidden from the programmer
in real OO style: just hack on, machines will get stronger anyway?

Jon Skeet

unread,

Oct 18, 2000, 3:00:00 AM10/18/00

Ki...@TCE.com wrote:
> > This is also useful when reading a file in a line at a time - by default,
> > I believe it allocates an 80 character buffer, which may be much too
> > large if you're reading in (for example) a "word-per-line" dictionary.
> > Creating a new string from each returned string again solves the problem.
> > (Patricia originally alerted me to that one.)
>
> Why don't you guys submit some requests for workarounds for these? Be
> sure to frame it such that the current default behavior is not changed,
> but provisions are added to request the alternate behavior. The default
> behavior does make sense in some situations. Your current workaround is
> not great because it still requires creating a String object that gets
> discarded.

Indeed. On the other hand, hopefully such issues are already understood
to be a *slight* problem within Sun, and there are much bigger areas
where String/StringBuffer could be improved, as we've mentioned several
times in the past. I'll think about sending in an RFE though...

0 new messages