[msh]get-content of large files.

146 views
Skip to first unread message

Dung K Hoang

unread,
Mar 3, 2006, 4:46:10 PM3/3/06
to
Hi,
I notice that when you use the cmdlet get-content on large files, it takes
some times before seeing output on the console.
I also observe that during the excution of the command, the virtual memoery
used by the process msh.exe can augment up to 300% and this virtual memory
is not released afterwards.
My question is: Is it a normal behavior? Any suggestion to use a different
cmdlet?

Thanks
/Dung


Nigel Sharples [MSFT]

unread,
Mar 3, 2006, 5:29:21 PM3/3/06
to
When I try it with a 250MB file I see content on the screen immediately, and
the process memory consumption doesn't grow. What does your command line
look like, and how large is the input file?

--
Nigel Sharples [MSFT]
Monad Test
Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.


"Dung K Hoang" <dungho...@hotmail.com> wrote in message
news:uXqIqvw...@TK2MSFTNGP10.phx.gbl...

Jeff Jones [MSFT]

unread,
Mar 3, 2006, 6:59:52 PM3/3/06
to
Try specifying the -ReadCount parameter with a value of -1.

MSH > get-content largefile.txt -readcount -1


--
Jeff Jones [MSFT]
Monad Development


Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.

"Dung K Hoang" <dungho...@hotmail.com> wrote in message
news:uXqIqvw...@TK2MSFTNGP10.phx.gbl...

k a @mvps.org Alex K. Angelopoulos

unread,
Mar 3, 2006, 7:27:00 PM3/3/06
to
"Jeff Jones [MSFT]" <jef...@online.microsoft.com> wrote in message
news:eooNR6xP...@TK2MSFTNGP10.phx.gbl...

> Try specifying the -ReadCount parameter with a value of -1.
>
> MSH > get-content largefile.txt -readcount -1

This does sound to me like there may be another underlying problem, Jeff -
although the workaround is good of course. :)

I'm have no problems with initial spinup reading text files of up to 80 MiB
and even - most significantly - an ISO image that is 3 GiB, about 4 times
the size of my physical RAM. Clearly, my version of get-content does not try
to read everything at once. Even if I pipe this into a filtering command, I
get immediate output.

The issue isn't Unicode per se, but I'm wondering if this might be either an
issue with the install or possibly a strange internationalization problem.

Hoang, when you answer Nigel's post, could you also tell us what version of
MSH you have, and what your OS and your locale settings are?

Dung K Hoang

unread,
Mar 3, 2006, 7:44:55 PM3/3/06
to
Thanks for your prompt answer!

My code is quite simple. The script looks like:

param ( $filename = (read-host "Filename pls"))
foreach ( $line in get-content $filename -readcount 1) # Try with
readcount
{
$line
}

Here is my environment:

Windows XP SP2 - Monad Beta 3.0 - Locale US
Windows server 2003 SP1 x64 - Monad Beta 3.1 - Locale US

My script is reading Exchange Messages tracking logs, IIS 6 log and ISA
server 2004 log files.. The average size is 40 MB
The log files are text files in ASCII format and use the W3C standard
format

/Dung

"Jeff Jones [MSFT]" <jef...@online.microsoft.com> wrote in message
news:eooNR6xP...@TK2MSFTNGP10.phx.gbl...

Nigel Sharples [MSFT]

unread,
Mar 3, 2006, 8:22:49 PM3/3/06
to
Try replacing your foreach line with this:

get-content $filename | foreach {$_}

--
Nigel Sharples [MSFT]
Monad Test

Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.

"Dung K Hoang" <dungho...@hotmail.com> wrote in message

news:%23S$pmTyPG...@TK2MSFTNGP09.phx.gbl...

Keith Hill [MVP]

unread,
Mar 3, 2006, 8:50:44 PM3/3/06
to
"Dung K Hoang" <dungho...@hotmail.com> wrote in message
news:%23S$pmTyPG...@TK2MSFTNGP09.phx.gbl...

> Thanks for your prompt answer!
>
> My code is quite simple. The script looks like:
>
> param ( $filename = (read-host "Filename pls"))
> foreach ( $line in get-content $filename -readcount 1) # Try with
> readcount
> {
> $line
> }

What Nigel suggests is a better way. With your approach (shown above), I'm
pretty sure that get-content has to return everything into an array *before*
foreach can start iterating over the contents of the array (or collection).
Using the pipeline approach allows you to start outputting the lines right
away.

--
Keith


Dung K Hoang

unread,
Mar 3, 2006, 8:46:30 PM3/3/06
to
Hi,

That makes the trick!
What is the rationale behind this?

As for the virtual memory issue, here is what I observe:
a) If I simply run the script, the process memory consumption does not grow.
b) When I apply the time-expression to the script, .ie. time-expression { my
script } to get the processing time, there is no output to the console, and
this is where the memory consumption increases.

Thanks for your answer,
/Dung

"Nigel Sharples [MSFT]" <nig...@microsoft.com> wrote in message
news:4408ebec$1...@news.microsoft.com...

k a @mvps.org Alex K. Angelopoulos

unread,
Mar 3, 2006, 8:57:19 PM3/3/06
to

"Nigel Sharples [MSFT]" <nig...@microsoft.com> wrote in message
news:4408ebec$1...@news.microsoft.com...
> Try replacing your foreach line with this:
>
> get-content $filename | foreach {$_}

What's actually going on, if it isn't apparent to you, Hoang, is that when
you do this:

foreach ( $line in get-content $filename)
{
...
}

you assemble all of the lines into a collection before beginning processing.

When you use the way Nigel shows, the pipeline passes each line into the
foreach { } block as soon as it is received. The result is that you do
processing per-item, so you get immediate output and you also end up using
much less memory since you aren't putting everything into memory at once.


Dung K Hoang

unread,
Mar 3, 2006, 8:57:55 PM3/3/06
to
As for the virtual memmory issue, in the time-expression, if I redirect the
output of the script to $Null then everything is OK!

time-expression { my-script > $Null }
Amother lesson learned!
You guys are great!
Thanks for your help,
/Dung

"Dung K Hoang" <dungho...@hotmail.com> wrote in message

news:O22354yP...@TK2MSFTNGP10.phx.gbl...

Jeff Jones [MSFT]

unread,
Mar 6, 2006, 1:30:28 PM3/6/06
to
time-expression calls Invoke on the ScriptBlock you specify to it on the
command line. Invoke gathers all the results and returns them to
time-expression which then writes them out to the pipeline. That is
probably where you are seeing the spike in memory usage. We would like to
provide more of a streaming behavior but that fell off the schedule for v1.

--
Jeff Jones [MSFT]
Monad Development
Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.

"Dung K Hoang" <dungho...@hotmail.com> wrote in message

news:ef53Y8yP...@TK2MSFTNGP09.phx.gbl...

/\/\o\/\/

unread,
Mar 6, 2006, 1:44:07 PM3/6/06
to
Jeff Jones [MSFT] wrote:
> time-expression calls Invoke on the ScriptBlock you specify to it on the
> command line. Invoke gathers all the results and returns them to
> time-expression which then writes them out to the pipeline. That is
> probably where you are seeing the spike in memory usage. We would like to
> provide more of a streaming behavior but that fell off the schedule for v1.
>

I noticed that select (and I think FT) has the same behavior.

As I'm testing with big SMS queries , the effect will take up to 1 gig
or more of memory (in effect hanging my machine).

when I'm using default-out all goes well (the WMI query is tuned to
return the records one-by-one)

I'm not using a scriptblock or anything in the select statement, so I
see no need for begin/end processing.

why does select gather the results ?
and is there a way to prevent this ?

greetings /\/\o\/\/

Marcel Ortiz [MSFT]

unread,
Mar 6, 2006, 2:33:22 PM3/6/06
to
Could you send us the commands you are using? The following commands using
select and format-table seems to be streaming (unless -last is used).

MSH>([string[]](1..3)) | foreach { write-host $_; $_ } | select Length |
foreach { write-host foo }
1
foo
2
foo
3
foo
MSH>([string[]](1..3)) | foreach { write-host $_; $_ } | select Length -last
10 | foreach { write-host foo }
1
2
3
foo
foo
foo
MSH>([string[]](1..3)) | foreach { write-host $_; $_ } | select
Length -first 10 | foreach { write-host foo }
1
foo
2
foo
3
foo
MSH>([string[]](1..3)) | foreach { write-host $_; $_ } | select Length |
format-table | foreach { write-host foo }
1
foo
foo
foo
2
foo
3
foo
foo
foo
MSH>foreach-object { for(){ write-object "foo" } } | foreach{ write-host
"foo"; $_ } | select Length | format-table | fo
reach { write-host "bar" }
foo
bar
bar
bar
foo
bar
foo
bar

In this last command, I don't see any increase in memory footprint after
running for about 15 minutes. However, if possible send us your command so
we can be sure about this.

Thanks.

--
Marcel Ortiz [MSFT]
Monad: Microsoft Command Shell


Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.


"/\/\o\/\/" <n...@spam.mow> wrote in message
news:%23kTh73U...@TK2MSFTNGP14.phx.gbl...

/\/\o\/\/

unread,
Mar 6, 2006, 2:51:11 PM3/6/06
to
I can post the examples tomorow, only without a 4000 client SMS
environment they might be hard to reproduce ;-)

I will also look if I can create some other examples.

gr /\/\o\/\/

Dung K Hoang

unread,
Mar 7, 2006, 8:11:46 AM3/7/06
to
On the same token, can we have a list of cmdlets that do not provide
streaming?
/\/\o\/\/ identifies some of them ( abeilt their vaildation by MS), I also
notice that the cmdlet group can consume lot of memory when dealing with big
number of objects.

Regards,
/Dung

"/\/\o\/\/" <n...@spam.mow> wrote in message
news:%23kTh73U...@TK2MSFTNGP14.phx.gbl...

/\/\o\/\/ [MVP]

unread,
Mar 7, 2006, 11:52:14 AM3/7/06
to
One SMS query, I have the "cashing" problem with :

$oq = new system.management.objectquery
$oq.QueryString = "select * from SMS_CM_RES_COLL_MOW000B6"
$mos = new-object system.management.ManagementObjectSearcher($oq)
$mos.scope.path = "\\SMSServer\root\sms\Site_MOW"

# Streaming (items one-by-one)

$mos.get()

# gathering / blocking (Items all at the same time)

$mos.get() | select name,ResourceID
$mos.get() | ft

b.t.w. this is only a collection, hence is still workable, the real problems
begin when I do this with big software queries.

/\/\o\/\/ [MVP]

unread,
Mar 7, 2006, 12:01:24 PM3/7/06
to
I have another example that might be easier to reproduce :

$oq = new system.management.objectquery
$oq.QueryString = "select * from win32_ntlogevent"
$mos = new-object system.management.ManagementObjectSearcher($oq)

# streaming :

$mos.get()

# caching :

$mos.get | select type
$mos.get | ft

b.t.w. I did disable the rewindable property also (and tested with
blocksize) :

MSH>$mos.options

ReturnImmediately : True
BlockSize : 10
Rewindable : False
UseAmendedQualifiers : False
EnsureLocatable : False
PrototypeOnly : False
DirectRead : False
EnumerateDeep : False
Context : {}
Timeout : 10675199.02:48:05.4775807

thanks for looking into it

gr /\/\o\/\/

Marcel Ortiz [MSFT]

unread,
Mar 7, 2006, 5:01:40 PM3/7/06
to
Thanks for the repros! I looked into it and its not problem with the
cmdlets. Its a problem with the implementation of pipelines. If you have a
pipeline and the first element is an IEnumerable, it is being iterated over
completely before the elements are sent to the next element. As you've
probably seen from your repro $mos.get() returns a very large collection
that doesn't enumerate very quickly so when you use it as the first pipeline
element MSH appears to hang while it iterates over it. While this iteration
is going on you can't cancel the pipeline so you're stuck waiting until it
finishes. Anyway, I'm opening a bug. Hopefully, we'll get this fixed. In
the meantime, here's a workaround:

MSH>1 | foreach { $mos.Get() } | select Type

Type
----
error
error
error
error
error
error
information
error
error
error
error
error

In this case, the IEnumerable returned by $mos.Get() isn't the first item of
a pipeline so you don't get that delay.


Thanks again for the help!
Marcel


"/\/\o\/\/ [MVP]" <oM...@discussions.microsoft.com> wrote in message
news:E41E1EC5-C2F2-43B3...@microsoft.com...

Reply all
Reply to author
Forward
0 new messages