Slow?

Al Fansome

unread,

Nov 18, 2009, 2:13:15 AM11/18/09

to

I have a set of about 230 files that I want to modify. I use the
following script:

rm tmp.tmp

foreach ($file in dir *exitlc.cfg)
{
echo $file.name
mv $file tmp.tmp

foreach ($line in cat tmp.tmp)
{
if ($line -match "^#.+version")
{
$line = $line -replace "version[ \t]+[^ \t]+",
"version 1.03"
}

if ($line -match "(.+DS60_[^ \t]+[ \t]+[^ \t]+[ \t]+)")
{
$line = $matches[1] + "25 25"
}

$line | add-content -encoding ascii $file
}
rm tmp.tmp
}

The script works just fine, but it takes between 1.5 and 2.0 seconds to
execute for each file. Each of these files are simple ASCII text, of
about 300 lines. At first blush this seems like a very long time to
process such small files. I'd appreciate comments as to whether I could
do better, and as to why this script takes so long.

This is using PS 2.0. My OS is Windows XP SP3. CPU is 2.6 ghz, and RAM
is 1 gigabyte. Disk is a Promise Raid 1 array of two 120 gb drives.

Thanks,

Al

stej

unread,

Nov 18, 2009, 3:05:37 AM11/18/09

to

What I would try:

1)

if ($line -match "^#.+version")
{
$line = $line -replace "version[ \t]+[^ \t]
+",
"version 1.03"
}

convert only to command with -replace only.
Something like $line = $line -replace '(?<pre>^#.+)(?<v>version[ \t]+
[^ \t]+)','${pre}version 1.03'

2) maybe .+? instead of .+ would be quicker.

3) I wouldn't add each line one by one, I would save it all at once.

You can publish the files somewhere, I can check how quick it is at my
machine.

stej

Roman Kuzmin

unread,

Nov 18, 2009, 8:27:23 AM11/18/09

to

> 3) I wouldn't add each line one by one, I would save it all at once.

I think this should solve the problem. Do not call add-content for
each line. Instead try something like this:

. {

foreach ($line in cat tmp.tmp)
{

...
$line
}
} | Set-Content $file

==
Thanks,
Roman Kuzmin
http://code.google.com/p/farnet/
PowerShell and .NET support in Far Manager

Martin Zugec

unread,

Nov 18, 2009, 4:00:22 PM11/18/09

to

Having a quick peek at your code, I would write it little bit different and
use RegEx replace instead:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx

Martin

"Al Fansome" <al_fa...@hotmail.com> wrote in message
news:INKdnSG6dLcWA57W...@supernews.com...

stej

unread,

Nov 18, 2009, 5:07:44 PM11/18/09

to

imho RegEx replace doesn't help very much here
Measure-Command { $r = new-object text.regularexpressions.regex
'3.*4','compiled'; 1..10000 | % { $r.replace(('123456'*10),'x') } }
2.217 sec
Measure-Command { 1..10000 | % { [regex]::replace
(('123456'*10),'3.*4','x', 'Compiled') } }
2.52 sec
Measure-Command { 1..10000 | % { ('123456'*10) -replace
'3.*4','x' } }
2.21 sec

stej

On Nov 18, 4:00 pm, "Martin Zugec" <martin.zu...@gmail.com> wrote:
> Having a quick peek at your code, I would write it little bit different and

> use RegEx replace instead:http://msdn.microsoft.com/en-us/library/system.text.regularexpression...
>
> Martin
>
> "Al Fansome" <al_fans...@hotmail.com> wrote in message

Al Fansome

unread,

Nov 18, 2009, 5:18:32 PM11/18/09

to

Instead of using add-content on each line, I accumulate each modified
line for the current file in a buffer variable:

$buf = $buf + $line + "`n"

After the file has been processed, I dump the entire buffer into the file:

set-content -encoding ascii -value $buf $file

I also added some instrumentation to measure per-file times. The code
now processes one file per 70 ms, as opposed to one file per 1500 ms, an
improvement by more than a factor of 20. Very impressive.

I imagine the problem with add-content is that it does a complete open,
append, and close cycle on the file for each line, and all of that extra
directory overhead is what's using up all the time.

Thanks for the advice.

stej

unread,

Nov 18, 2009, 5:29:18 PM11/18/09

to

Hm, that's interesting!

You might try to use StringBuilder class as well - every time you join
strings, new string is created -> it's cpu and memory expensive.
$buff = new-object text.stringbuilder 200 # 2000 is total expected
length (default is 16)
[void]$buff.AppendLine('first line')
[void]$buff.AppendLine('second line')
$buff.ToString() #returns "first line<new line>second line"

In your case it would be [void]$buff.AppendLine($line)

stej

On Nov 18, 11:18 pm, Al Fansome <al_fans...@hotmail.com> wrote:
> Instead of using add-content on each line, I accumulate each modified
> line for the current file in a buffer variable:
>
> $buf = $buf + $line + "`n"
>
> After the file has been processed, I dump the entire buffer into the file:
>
> set-content -encoding ascii -value $buf $file
>
> I also added some instrumentation to measure per-file times. The code
> now processes one file per 70 ms, as opposed to one file per 1500 ms, an
> improvement by more than a factor of 20. Very impressive.
>
> I imagine the problem with add-content is that it does a complete open,
> append, and close cycle on the file for each line, and all of that extra
> directory overhead is what's using up all the time.
>
> Thanks for the advice.
>
> Martin Zugec wrote:
> > Having a quick peek at your code, I would write it little bit different
> > and use RegEx replace instead:

> >http://msdn.microsoft.com/en-us/library/system.text.regularexpression...
>
> > Martin
>
> > "Al Fansome" <al_fans...@hotmail.com> wrote in message

Al Fansome

unread,

Nov 18, 2009, 6:04:14 PM11/18/09

to

I made $buf into a stringbuilder object, and it improved speed by about
10%. My guess is that the total time is dominated by file I/O.

Larry__Weiss

unread,

Nov 18, 2009, 8:11:53 PM11/18/09

to

My measurements show that

$buf = $buf + ($line + "`n")

will execute much faster if a large number of iterations are involved.

- Larry

stej

unread,

Nov 19, 2009, 2:48:05 AM11/19/09

to

Larry, just post your measures. Mine are:

# append long string
measure-command { 1..5000 | % { $buff = new-object text.stringbuilder
10000; 1..100 |% { $buff.AppendLine(('a'*100)) } } }
40,8sec
measure-command { 1..5000 | % { $buff = ''; 1..100 |% { $buff +=
('a'*100) +"`n" } } }
43,1sec

# append very short string
measure-command { 1..50 | % { $buff = new-object text.stringbuilder
10000; 1..10000 |% { $buff.AppendLine('a') } } }
36,1sec
measure-command { 1..50 | % { $buff = ''; 1..10000 |% { $buff += 'a'
+"`n" } } }
43,5sec

It depends very much on the length of appended string.

On 19 lis, 02:11, Larry__Weiss <l...@airmail.net> wrote:
> My measurements show that
>
> $buf = $buf + ($line + "`n")
>
> will execute much faster if a large number of iterations are involved.
>
> - Larry
>
> Al Fansome wrote:
> > Instead of using add-content on each line, I accumulate each modified
> > line for the current file in a buffer variable:
>
> > $buf = $buf + $line + "`n"
>
> > After the file has been processed, I dump the entire buffer into the file:
>
> > set-content -encoding ascii -value $buf $file
>
> > I also added some instrumentation to measure per-file times. The code
> > now processes one file per 70 ms, as opposed to one file per 1500 ms, an
> > improvement by more than a factor of 20. Very impressive.
>
> > I imagine the problem with add-content is that it does a complete open,
> > append, and close cycle on the file for each line, and all of that extra
> > directory overhead is what's using up all the time.
>
> > Thanks for the advice.
>
> > Martin Zugec wrote:
> >> Having a quick peek at your code, I would write it little bit
> >> different and use RegEx replace instead:

> >>http://msdn.microsoft.com/en-us/library/system.text.regularexpression...
>
> >> Martin
>
> >> "Al Fansome" <al_fans...@hotmail.com> wrote in message

Larry__Weiss

unread,

Nov 19, 2009, 11:40:59 AM11/19/09

to

Yes, I have some measurements that clearly show that the infix operator way of
building a string up by small appends does not scale well, and doing it by the
pattern

$s = $s + "xxx" + "yyy"

just makes it slower still by an unexpectedly large factor.

Consider:

$s = ""; Measure-Command {1..10 | % {$s = $s + "xyzzy" + "`n"} } | fl Ticks
Ticks : 68271
6,827 ticks per append

$s = ""; Measure-Command {1..100 | % {$s = $s + "xyzzy" + "`n"} } | fl Ticks
Ticks : 547485
5,475 ticks per append

$s = ""; Measure-Command {1..1000 | % {$s = $s + "xyzzy" + "`n"} } | fl Ticks
Ticks : 4095285
4,095 ticks per append

$s = ""; Measure-Command {1..10000 | % {$s = $s + "xyzzy" + "`n"} } | fl Ticks
Ticks : 181310792
18,131 ticks per append

$s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy" + "`n"} } | fl Ticks
Ticks : 1032937725
51,647 ticks per append

So we see it not scaling well at all for the largest iterations.

Interestingly, look what this code variation measures as:

$s = ""; Measure-Command {1..10 | % {$s = $s + "xyzzy`n"} } | fl Ticks
Ticks : 62555
6,256 ticks per append

$s = ""; Measure-Command {1..100 | % {$s = $s + "xyzzy`n"} } | fl Ticks
Ticks : 588960
5,890 ticks per append

$s = ""; Measure-Command {1..1000 | % {$s = $s + "xyzzy`n"} } | fl Ticks
Ticks : 3829056
3,829 ticks per append

$s = ""; Measure-Command {1..10000 | % {$s = $s + "xyzzy`n"} } | fl Ticks
Ticks : 109793102
10,979 ticks per append (compare to 18,131 ticks per append)

$s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy`n"} } | fl Ticks
Ticks : 571517650
28,576 ticks per append (compare to 51,647 ticks per append)

So, not much optimization going on there!

- Larry

stej

unread,

Nov 19, 2009, 3:42:32 PM11/19/09

to

Based on your measurements I tried this:

> $s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy" + "`n"} } | select -exp ticks; $s.length
216039306
120000
> $s = ""; Measure-Command {1..(20000/4*2) | % {$s = $s + "xyzzy" + "`n" + "xyzzy" + "`n"} } | select -exp Ticks; $s.length
193721120
120000
> $s = ""; Measure-Command {1..(20000/6*2) | % {$s = $s + "xyzzy" + "`n" + "xyzzy" + "`n" + "xyzzy" + "`n" } } | select -exp Ticks; $s.length
205190128
120006
> $s = ""; Measure-Command {1..(20000/8*2) | % {$s = $s + "xyzzy" + "`n" + "xyzzy" + "`n" + "xyzzy" + "`n" + "xyzzy" + "`n" } } | select -exp Ticks; $s.length
181509387
120000

imho it doesn't matter how many "+" there is. I think that all the
magic could be caused by garbage collector. With every + there is a
new string allocated and the old one is forgotten. That's quite memory
expensive and when too many objects are allocated, gc is run.

I mentioned StringBuilder because of this:
(it's equivalent to the last command)

> Measure-Command { 1..(20000/8*2) | % {
$buff.append("xyzzy"); $buff.append("`n");
$buff.append("xyzzy"); $buff.append("`n");
$buff.append("xyzzy"); $buff.append("`n");
$buff.append("xyzzy"); $buff.append("`n") } } | select -exp Ticks;
$buff.length
9064070
120000

Larry__Weiss

unread,

Nov 19, 2009, 4:10:45 PM11/19/09

to

The most important thing here is to realize that there are alternatives like
StringBuilder. It's absolutely vital when the real world scales up on you to a
degree you have never experienced when testing.

A degradation at an almost exponential rate is not something you want to allow
Murphy to have at his disposal! (or maybe better to say it will prove Murphy
correct at the worst of times)

- Larry

Larry__Weiss

unread,

Nov 19, 2009, 4:50:34 PM11/19/09

to

Have you tried

$s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy" + "`n"} } | select
-exp ticks; $s.length

1349420647
120000

$s = ""; Measure-Command {1..20000 | % {$s = $s + ("xyzzy" + "`n")} } | select
-exp ticks; $s.length
727340851
120000

$s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy`n"} } | select -exp
ticks; $s.length
732108064
120000

Do you get this large difference in runtime?

- Larry

stej

unread,

Nov 20, 2009, 6:13:53 AM11/20/09

to

At my parents in law computer:

> $s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy" + "`n"} } | select -exp ticks; $s.length

258436122

120000
> $s = ""; Measure-Command {1..20000 | % {$s = $s + ("xyzzy" + "`n")} } | select -exp ticks; $s.length

107454640

120000
> $s = ""; Measure-Command {1..20000 | % {$s = $s + "xyzzy`n"} } | select -exp ticks; $s.length

135194417
120000

I have no explanation for why there is so huge difference.

stej

Larry__Weiss

unread,

Nov 20, 2009, 11:09:19 AM11/20/09

to

This one is interesting enough that I may open an issue about it at

https://connect.microsoft.com/powershell

- Larry

Larry__Weiss

unread,

Nov 21, 2009, 12:53:43 PM11/21/09

to

I created the feedback issue

https://connect.microsoft.com/PowerShell/feedback/ViewFeedback.aspx?FeedbackID=513075

- Larry

Larry__Weiss wrote:
> This one is interesting enough that I may open an issue about it at
> https://connect.microsoft.com/powershell
>