StringBuilder is obviously more efficient dealing with string concatenations
than the old '+=' method... however, in dealing with relatively large string
concatenations (ie, 20-30k), what are the performance differences (if any
with something as trivial as this) between initializing a new instance of
StringBuilder with a specified capacity vs. initializing a new instance
without... (the final length is not fixed)
ie,
Performance differences between:
StringBuilder sb1 = new StringBuilder(30000);
and
StringBuilder sb1 = new StringBuilder();
Does this make a huge difference compared to '+='?
Cheers,
-k
If you initialize the SB, you are better off if you go over the default size
because it doesn't have to reallocate space, but this only comes into play
if you exceed the default size. The performance differences really only are
noticed between the two if you exceed the boundaries, so it's hard to say in
absolute terms, it depends on the situation. If possible, try to initialize
the capacity, but even if you don't, you'll be much better off than +=.
HTH,
Bill
"Kevin C" <kevin-c...@sbcglobal.net> wrote in message
news:E0yjb.1401$FB1...@newssvr27.news.prodigy.com...
thanks for the quick reply -- that makes sense.
As a footnote though, which has the most negative (theorical?) effect on
performance:
1) over initializing an instance, ie., setting capacity at 30,000 characters
when you only need 20,000
or
2) under initialzing an instance, ie., setting capacity at 10,000 characters
and having the StringBuilder class dynamically allocate more room for the
additional 10,000 characters when you try to append 20,000.
-k
"William Ryan" <dotne...@comcast.nospam.net> wrote in message
news:%23bA$LR$kDHA...@TK2MSFTNGP09.phx.gbl...
>
> As a footnote though, which has the most negative (theorical?) effect
> on performance:
>
> 1) over initializing an instance, ie., setting capacity at 30,000
> characters when you only need 20,000
> or
> 2) under initialzing an instance, ie., setting capacity at 10,000
> characters and having the StringBuilder class dynamically allocate
> more room for the additional 10,000 characters when you try to append
> 20,000.
>
I would say that number 2 has the most negative impact. When you append
characters that exceed the capacity of the StringBuilder, it must allocate
memory large enough to hold the new string, copy the existing characters to
it and then add the new characters. Whereas on number, the allocation is
already do and it just has to append the new data.
Chris
Jerry
"Chris Dunaway" <duna...@lunchmeatsbcglobal.net> wrote in message
news:Xns941679AE7B6A8du...@207.46.248.16...
Do you have any evidence of this? This is certainly the first I've
heard of it. As far as I'm aware, StringBuilder has a buffer, and once
that is full, the buffer is copied and resized. That's the view that
the rotor source suggests, too.
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jerry III" <jerr...@hotmail.com> wrote in message
news:%23ta9vnA...@TK2MSFTNGP11.phx.gbl...
For more insights on Strings:
Strings UNDOCUMENTED
http://www.codeproject.com/dotnet/strings.asp
Regards,
Fergus
00004 // Copyright (c) 2002 Microsoft Corporation. All rights reserved.
00005 //
00017 ** Class: StringBuilder
00020 **
00021 ** Purpose: A prototype implementation of the StringBuilder
00022 ** class.
00023 **
00024 ** Date: December 8, 1997
00025 ** Last Updated: March 31, 1998
00026 **
00028 namespace System.Text {
00029 using System.Text;
00030 using System.Runtime.Serialization;
00031 using System;
00032 using System.Runtime.CompilerServices;
00033
00052 [Serializable()] public sealed class StringBuilder {
Full details at:
http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/fx/bcl/stringbuilder_8cs-
source.html
Regards,
Fergus
Jerry
"Fergus Cooney" <filt...@tesco.net> wrote in message
news:u$ZAThBlD...@TK2MSFTNGP11.phx.gbl...
What do you mean by "just allocate large enough String"?
Could you give an example of concatenating many strings together in a
loop and allowing parts of the concatenation to be removed or replaced
where your code gives better performance than StringBuilder?
You're right - and shouldn't be disappointed - StringBuilder <is> a lot
more efficient (time-wise, at least). Because it doubles when it needs to
grow, the number of 'grows' is pretty minimal. Of course, if you have a 10MB
string to which you want to add a single character, it'll grab another 20MB to
do it!!
The trouble with allocating a huge string is that as soon as you do
anything with it, you'll get a new totally different string - with that
massive one left for the GC. Ouch! You can't insert <into> a string, but
that's exactly what the StringBuilder is designed for.
Regards,
Fergus
ps. Jon is a need-for-truth man. Getting things wrong and having JS correct
you happens to me too, lol - and then I know more than I did. So, too, does
anyone else who had the same misconceptions. :-)
Fortunately it also happens to me too. I'd far rather post my beliefs
and have them thoroughly disproved (as has happened several times) than
shut up and have people believe that I know what's going on when I
don't.
Basically, what I'm trying to say is that being wrong is something that
happens to absolutely everyone, and that I mean no disrespect when I
correct/question someone.
Personally I think it might be worth giving up some speed in ToString and
removing/replacing in order to make appending a lot faster. And if you're
going to ask - no, I don't have any code that proves any of this :(
Jerry
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.19f9c23ad...@msnews.microsoft.com...
Ah - I have, although I agree that the most common case is just
appending.
> In those cases using existing code (those two classes) will be
> efficient, but if you only do concatenating you are still far better off
> making a guess of how long will your string be and preallocating the memory
> yourself - i.e. telling StringBuilder how much it should allocate in the
> constructor.
>
> Personally I think it might be worth giving up some speed in ToString and
> removing/replacing in order to make appending a lot faster. And if you're
> going to ask - no, I don't have any code that proves any of this :(
If you're never going to remove/replace/insert, you probably could
indeed improve the performance. Having said that, a quick attempt to do
so in an obvious way failed. One of the advantages of StringBuilder is
that if you don't need to expand the string in the end, the result of
ToString is just *there* with no further effort.
I might investigate this further though - see if I can come up with
something which beats StringBuilder in the simple case.
I suspect that StringBuilder is rarely the bottleneck in apps, however
- whereas simple repeated string concatenation in a loop easily could
be.
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.19f8efa12...@msnews.microsoft.com...
Unfortunately, without more code, it's not worth a lot :(
If you *are* doing a lot of string concatenations, and you don't need
the results as strings between operations, it really *should* have been
quicker using StringBuilder.
If you ever get a separable bit of code which is demonstrating that
behaviour, I'd be interested to see it.
Jerry
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.19fa4bde4...@msnews.microsoft.com...
I've seen your posts and know a bit about your mind. I've seen your site
and know a bit about your heart. 'Tis good for I respect you in both ways. ;-)
Regards,
Fergus
Yup, that's absolutely right. Note that I think the threshold for doing
this in Java may be much lower or even non-existent, as all string
concatenations in Java use StringBuffer anyway.
public static string GetID(...)
{
return contextZero.ID + ciDelimiter + dvZero.ID + ciDelimiter +
contextOne.ID + ciDelimiter + dvOne.ID + ciDelimiter +
contextTwo.ID + ciDelimiter + dvTwo.ID + ciDelimiter +
chDtls.ID + chDtls.IDExtension;
}
Each item in this statement is a string. It's returned in repsonse to a
request for an ID. There are literally hundreds, if not thousands of
statements like this in our code. And they are called LOTS of times --
anytime the ID of an object is required. They should be a critical
performance bottleneck because of this. We considered going with numeric IDs
and several other things, but strings have advantages for us. So I switch to
using StringBuilder.
To my amazement, I found that replacing these string concats with
StringBuilder equivalents actually reduced the measured performance of our
app. Anyone see a flaw or a reason that StringBuilder would falter in a
situation like this?
Dave
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.19fc4b472...@msnews.microsoft.com...
Why does this matter?
Well, you are actually building a string. Not just building a buffer and
then copying to a string. So in the common case:
1) You create the buffer.
2) You append.
3) You append.
4) You append.
...
297345) You append.
297346) You get the string you just created.
For large strings, this is a TREMENDOUS performance win for the common case,
especially if you make the buffer big enough to hold the entire string.
This eliminates not only a mult-megabyte copy operation, but also prevents
you from having two copies of the huge string in memory at the same time.
It also gets you out of some messy GC.
Of course, if you don't use StringBuilder exactly this way, there can be
some performance side effects.
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.19fa29536...@msnews.microsoft.com...
Ah no - a statement like that will already be concatenated pretty
efficiently.
> Each item in this statement is a string. It's returned in repsonse to a
> request for an ID. There are literally hundreds, if not thousands of
> statements like this in our code. And they are called LOTS of times --
> anytime the ID of an object is required. They should be a critical
> performance bottleneck because of this. We considered going with numeric IDs
> and several other things, but strings have advantages for us. So I switch to
> using StringBuilder.
>
> To my amazement, I found that replacing these string concats with
> StringBuilder equivalents actually reduced the measured performance of our
> app. Anyone see a flaw or a reason that StringBuilder would falter in a
> situation like this?
Yes - the above is a single step concatenation; it doesn't produce a
lot of intermediate strings. The thing to avoid would be converting the
above into stuff like:
public static string GetID(...)
{
string ret = contextZero.ID;
ret += ciDelimiter;
ret += dvZero.ID;
// etc
}
or worse still, doing a lot of concatenations in a loop.
Yeah, but that's not a lot of concatenations. That's one big
concatenation.
This is a lot of concatenations:
StringBuilder sb = contextZero.ID;
sb += ciDelimiter;
sb += dvZero.ID;
sb += ciDelimiter;
sb += contextOne.ID;
sb += ciDelimiter;
sb += dvOne.ID;
sb += ciDelimiter;
sb += contextTwo.ID;
sb += ciDelimiter;
sb += dvTwo.ID;
sb += ciDelimiter;
sb += chDtls.ID;
sb += chDtls.IDExtension;
-- Rick
Just to add my $.02: the above gets translated to a single
String.Concat(string[]) as oposed to a lot of String.Concat(string) calls.
That's why replacing it with StringBuilder wil actually make the code
slower.
Jerry
> public static string GetID(...)
> {
> string ret = contextZero.ID;
> ret += ciDelimiter;
> ret += dvZero.ID;
> // etc
> }
>
> or worse still, doing a lot of concatenations in a loop.
Loop is the key word here :) Especially one that runs couple hundred
times...
> --
> Jon Skeet - <sk...@pobox.com>
> http://www.pobox.com/~skeet
> If replying to the group, please do not mail me too
Jerry
So I am right in concluding that code that looks like this:
public static string GetID(...)
{
string ret = contextZero.ID;
ret += ciDelimiter;
ret += dvZero.ID;
// etc
}
should use StringBuilders (or does it even matter)?
If the code doesn't do any inserts or replaces, would it be wise to change
the code to look like this:
public static string GetID(...)
{
return contextZero.ID + ciDelimiter + dvZero.ID + ciDelimiter +
contextOne.ID + ciDelimiter + dvOne.ID + ciDelimiter +
contextTwo.ID + ciDelimiter + dvTwo.ID + ciDelimiter +
chDtls.ID + chDtls.IDExtension;
}
I understand that readability is also a concern.
Thanks.
ice
"Fergus Cooney" <filt...@tesco.net> wrote in message
news:uLPqodBl...@tk2msftngp13.phx.gbl...
If it uses more than a few string concatenations, it would be worth
using a StringBuilder. For fewer than about 5, it's unlikely to be
faster and may be slower.
> If the code doesn't do any inserts or replaces, would it be wise to change
> the code to look like this:
>
> public static string GetID(...)
> {
> return contextZero.ID + ciDelimiter + dvZero.ID + ciDelimiter +
> contextOne.ID + ciDelimiter + dvOne.ID + ciDelimiter +
> contextTwo.ID + ciDelimiter + dvTwo.ID + ciDelimiter +
> chDtls.ID + chDtls.IDExtension;
> }
>
> I understand that readability is also a concern.
Yes, make readability the principal issue, then work out the
bottlenecks. Once you've done that, convert just the appropriate bits
of code into that second form, which should be the fastest (IMO).
This is actually a great thread because it shows why even the best minds (no
sarcasm implied here at all) are often fooled when it comes to perf.
Assumptions can easily be wrong, so it's vitally important to measure. I'm
rarely bold enough to predict even the most basic perf results because I
remember how many times I was burned. Certainly I would never make a
statement so bold as "Use Stringbuilders to concatenate" without having a
deeper understanding of the concatenation pattern. I tend to give advice
like "Many users find Stringbuilders useful for their concatenation pattens,
consider using them and measure to see if they help you". Wussy but safe :)
OK, now for some actual, no-kidding-around useful advice (I hope).
When stringbuilders are faster than regular strings it is because of string
allocations and copies that didn't have to happen. Imagine a string builder
that had a buffer that was always the perfect size for the string it held...
it'd be useless right because for sure the next string you appended wouldn't
fit and you'd have to make a new buffer (the new exact size). In fact,
such a stringbuilder wouldn't be very much different than just a string. So
having some slop on the end is important, it's because there's a little room
at the end that we can add things without having to copy.
Well, ok so how much slop? Suppose there was some slop but that the slop
wasn't usually enough to accomodate the appends that happen. Again we'd be
worse off than just strings (strings at least have no slop). There has to
be enough slop that its likely that many appends will fit within the slop,
otherwise there isn't much savings.
Let me break it into 4 cases:
Big string and small appends
=>It's substantially likely that appends will fit in the slop and so they're
fast, this is the best case(buffer size becomes double the string when it no
longer fits so on average the slop is half the current string length) (if
there are lots of small appends to a big string you win the most using
stringbuilder)
Big string and big appends:
=>While the string is comparable in size (or smaller) to the appends
stringbuilder won't save you much, if this continues to the point where the
appends are small compared to the accumlated string you're in the good case
Small string big appends:
=> bad case, string builder will just slow you down until enough slop has
built up to hold those appends, you move to "big string big appends" as you
append and finally to "big string small appends" if/when the buffer becomes
collossal
Small string, small appends:
=> could be ok if you had a good idea how big your string was going to get
and preallocated enough so that you have sufficient slop for the appends.
You might be able to do better if you just concated all the small appends
together in one operation.
One last tip: Sometimes you can get substantial savings by changing
currency in the right places in your algorithm
Pattern A:
StringBuilder sb = new StringBuilder(SuitableSize);
// sb gets a bunch of stuff
sb += GetMyObjectID(foo,bar); // GetMyObjectID makes a string
Pattern B:
StringBuilder sb = new StringBuilder(SuitableSize);
// sb gets a bunch of stuff
AppendMyObjectID(sb, foo,bar); // function puts the ID directly into
the buffer
PatternB has no temporary string for the return value, this *might* be
better depending on the nature of the ObjectID composition/calculation.
Something you'd want to measure.
Remember it's all about reducing memory traffic so the competitors are
-the memory in the stringbuilder, including the slop
-the temporary strings (if any) in your algorithm with and without
stringbuilders
-the final output string (note that getting the string out of a
stringbuilder doesn't cause a new alloc, the existing buffer is converted
into a string and then the stringbuilder is logically empty so you don't pay
this cost twice if you use stringbuilder. You do pay for the final output if
you don't use stringbuilder but then you didn't have to pay for the builder
up front)
It's very hard to say which is faster/smaller in general... it's all about
the usage pattern.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Rico Mariani
CLR Performance Architect
"Rico Mariani [MSFT]" <ri...@online.microsoft.com> wrote in message
news:u1%23KsnMn...@TK2MSFTNGP12.phx.gbl...
Ray Beckett
"Rico Mariani [MSFT]" <ri...@online.microsoft.com> wrote in message
news:u1%23KsnMn...@TK2MSFTNGP12.phx.gbl...
"Ray Beckett" <raybe...@hotmail.com> wrote in message
news:%23uXd0qm...@TK2MSFTNGP11.phx.gbl...