I'm having a very very frustrating experience with the .NET. I've a simple
crawler console application.
The main objective of the crawler is to read a list of URLs and make HTTP
calls to a web server and save
the html files locally.
I had setup perfmon to monitor the memory usage of the application. I found
that the Gen 2 heap size keeps increasing
and ultimately the system runs out of memory. Whereas Gen 0 and Gen 1 heap
size is stable (it increases and decreases as GC runs).
I understand that the objects that have lived long enough are ultimately
promoted to Gen 2. But none of my objects have
that much state information to cause the Gen 2 heap to grow incessantly!!
I'm using many temporary objects like
HttpWebRequests, StringBuilder and Streams. But these objects live only as
long as the HTTP request lasts. I'm not
saving these objects as my class members.
I would appreciate if someone can throw some light on this strange
behaviour. I'm so frustrated that I'm planning to
re-write the code in C++.....atleast I'll have control over when the memory
is to be released.
Thanks in advance.
Mahesh
Regards
Rahul
"Mahesh Prasad" <mahesh...@hotmail.com> wrote in message
news:OVJ0Ih7z...@TK2MSFTNGP10.phx.gbl...
1. I've a set of 30,000 urls that have to crawl. These urls are stored in a
database.
2. I use a single DataSet object to read 100 urls at a time.
3. For each url I retreive from DataSet , I make an HTTP call to the web
server using HttpWebRequest object.
4. When the web server returns the HTTP Response, I save the Response stream
in a StringBuilder object.
5. I then parse the HTML text stored in the StringBuilder object looking for
<img> tags. If I find some image references,
I then use another HttpWebRequest object to request for images and save
the images locally, using the FileStream object.
6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.
7. After all the 100 urls have been crawled, I clear the DataSet object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues.
The only objects that can take up memory is StringBuilder, HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they go out of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.
During the process I noticed that the Gen 2 heap size just keeps growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects that
can take up memory are the ones that are created at function scope and so
should be freed the next time GC kicks in.
So they should't be part of the Gen 2 heap.
Please let me know if you want more details.
Thanks
Mahesh
"Rahul Kumar" <rahul...@saREMOVEITge.com> wrote in message
news:ewCGFvT0...@TK2MSFTNGP11.phx.gbl...
"Mahesh Prasad" <mahesh...@hotmail.com> wrote in message
news:#bofVZU0...@TK2MSFTNGP12.phx.gbl...
Also, you might try calling GC.Collect(2) and WaitForPendingFinalizers() to
force a gen 2 collection to take place to see if that really is the issue,
and/or you could set the MaxWorkingSet property of the current process to a
low number ( new IntPtr(1000000) ) to see if attempting to reduce the
working set of the process reclaims memory. *But I would only use those
techniques for diagnosing potential problems* - don't leave them in.
Richard
--
C#, .NET and Complex Adaptive Systems:
http://blogs.geekdojo.net/Richard
"Mahesh Prasad" <mahesh...@hotmail.com> wrote in message
news:%23bofVZU...@TK2MSFTNGP12.phx.gbl...
--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
"Richard A. Lowe" <cha...@yumspamyumYahoo.com> wrote in message
news:urJuy1W0...@tk2msftngp13.phx.gbl...
I use the string object to read the html returned by the HttpWebResponse
object.
function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();
// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();
// Pass string as a reference, to parse the Html for images.
ParseHtmlForImages(ref sHTML);
// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();
}
And the Byte array is used to store the binary data from the response stream
and save it locally.
void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();
}
protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory
}
"Richard A. Lowe" <cha...@yumspamyumYahoo.com> wrote in message
news:urJuy1W0...@tk2msftngp13.phx.gbl...
> This is a known issue affecting 1.1 Framework and 1.0 Framework. It is
> slated for fix in the next release of the framework. There is nothing you
> can do about it except be vigilant and responsible about your memory usage
> and memory allocatio/deallocation.
Wasn't GC supposed to release programmers from this responsibility?
(Ha!) Clearly, the entire .NyET marketecture is a shining example
of cretinism, plain and simple.
--
Joe Foster <mailto:jlfoster%40znet.com> L. Ron Dullard <http://www.xenu.net/>
WARNING: I cannot be held responsible for the above They're coming to
because my cats have apparently learned to type. take me away, ha ha!
Anyways, it is slated for a fix so if you hang on, it shouldn't be that
long. But again, it was slated for a 1.1 fix. There is a tall discussion on
c sharp yahoo groups as to where exactly the bug is located. I remember
someone saying it was deep in the win32 mem code - meaning it was not easy
to fix.
If you really desire a work around you will have to implement logic to
periodically unload the current application domain. At this point, all
allocated blocks are returned to the operating system. It's cheesy but it is
a valid work around. You could implement this if for example, your
application notices that it is running out of memory. GC object provides
handles and hooks to give you that type of info by the way.
--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
"Joe "Nuke Me Xemu" Foster" <j...@bftsi0.UUCP> wrote in message
news:e3La56k0...@TK2MSFTNGP12.phx.gbl...
I didn't know this was a known issue ! Is there any mention of this issue on
the MSDN site ?
Can you give more info (or links) about the workaround for the memory
problem.
Thanks
Mahesh
"Alvin Bruney" <vapor at steaming post office> wrote in message
news:esUUorl0...@TK2MSFTNGP10.phx.gbl...
Eric Gunnerson bob grommes memory leak
you should find the thread.
--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
"Mahesh Prasad" <mahesh...@hotmail.com> wrote in message
news:%23Swk$Rw0DH...@TK2MSFTNGP12.phx.gbl...
Does the profiler tell you what is hanging on to a reference to these
objects?
> // save the html file locally
> StreamWriter sw = File.CreateText(localFilePath);
> sw.Write(sHTML);
> // close the reader and write streams
> sw.Close();
StreamWriter has a Dispose method. Whether it would make any different I
don't know; but if there is a Dispose method you should call it.
> FileStream fs = new FileStream(localFilePath,FileMode.Create);
> fs.Write(buffer,0,BytesRead);
> fs.Close();
Ditto FileStream.
Tim
Didn't find anything that way. If there is a "known problem", can you tell
us what it is?
I realise there are specific issues, for example with the RegEx cache.
Tim
I started having the same problem with my ASP.NET loading 100k rows of data
into memory. Nulling out the dataset and calling dispose still left the
memory topped out in the aspnet worker process. I beat my head for a while.
Incidentally, took an aspnetpro course brought it to the attention no lino
tadros. He pointed out that its a known issue and a lot of people have been
complaining about it and they had a fix in 2.0.
When i have time, and if you still need this, i'll search the newsgroup for
it.
--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
"Tim Anderson" <tim...@hotmail.com> wrote in message
news:%23AOOc38...@TK2MSFTNGP09.phx.gbl...
> Incidentally, took an aspnetpro course brought it to the attention no lino
> tadros. He pointed out that its a known issue and a lot of people have
been
> complaining about it and they had a fix in 2.0.
>
> When i have time, and if you still need this, i'll search the newsgroup
for
> it.
Yes please. I'd like to know the circumstances in which the memory is not
released.
Many thanks,
Tim
these aren't posted in the time order, but rather random cut and pastes of
the conversations that i found out on the newsgroup related to the thread.
the following is all i could find.
> I talked with one of the CLR guys, and he said that the GC will give
memory
back to the OS. I've seen a lot
> of reports about memory usage, and I've been told that task manager
doesn't
give an accurate view of what's
> going on, and that you should use the .NET perf counters instead. I
haven't
tried that myself yet.
Interesting, thanks, Eric. Now how do we submit a defect report on Task
Manager? ;->
Is there a nice app wrapping the .NET perf counters out there somewhere? I
suppose they have to be bound into the app under measurement?
Actually, I posed that question in a PPT
to the StlCSharpDotNet Sig last Monday.
We had a similar report in the StlVBDotNet
group about a month ago. No one knows what
the problem is, but we are looking at a
lock/unlock scenario higher than a page/user
lock: an application lock(more likely) or a
server lock(unlikely). The problem may go
on for more than a day, then go away.
Any other input??????
Rick
Yea, I had a problem with this earlier. It appears when you minimize an
application (Console or Windows), it forces Windows to reclaim the
memory from the .NET Framework that the GC has cleared. The task manager
memory report is not accurate on what your memory usage is unless you
minimize the program and instantly take a reading before it changes.
K-Dub
-----Original Message-----
From: Tomasz Siwarga [mailto:tomsiw@w...]
Sent: Wednesday, July 23, 2003 11:07 PM
To: CSha...@yahoogroups.com
Subject: Re: [C#.NET] Memory leak
I test it on VS 2003 and it steel doesn't work. But, I find out an
interesting effect. I create a simplest Windows application (simple form
without any other controls). When I run it - Windows Task Manager
display
something about 14 MB of memory allocated by this application. Now, when
I
minimize window of this application the memory allocated decrease
dramatically (800 kB). If I restore the window - memory increase a bit
but
it is steel not so large (1.5 MB - 2 MB). In this way I free about 12 MB
of
memory. This effect refers to almost all the windows applications
(including
IE, WordPad ...)
> I thought the answer was "yes", but after a bit more thought, I fear
> that I'm not sure.
>
> I'll see if I can find out.
>
> -----Original Message-----
> From: Ron Jeffries [mailto:ronjeffries@a...]
> Sent: Wednesday, July 23, 2003 12:11 PM
> To: CSha...@yahoogroups.com
>
> On Wednesday, July 23, 2003, at 1:14:27 PM, Eric Gunnerson wrote:
>
> > When you allocate large objects (IIRC, large > 64K), they get
> > allocated from a large object heap, which isn't handled the same way
> > as normal sized objects.
>
> > There are some issues with the large heap not working right in the
1.0
>
> > version of the framework (aka VS 2002). It's fixed in the .NET
> > framework SP2, and also in VS 2003.
>
> Hi Eric,
>
> In any case, once a .NET runtime grabs memory for the heap from the
OS,
> does it ever give it back?
>
> Ron Jeffries
> www.XProgramming.com
> I'm giving the best advice I have. You get to decide whether it's true
> for you.
C# .NET!
---------------------------------------
http://Groups.Yahoo.com/group/CSharpNET
CSharpNET - C#.NET/C# Developers' Group
Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
From: "Eric Gunnerson" <ericgu@m...>
Date: Sun Aug 17, 2003 1:12 pm
Subject: RE: [C#.NET]Memory and GC
ADVERTISEMENT
I talked with one of the CLR guys, and he said that the GC will give memory
back
to the OS. I've seen a lot of reports about memory usage, and I've been told
that task manager doesn't give an accurate view of what's going on, and that
you
should use the .NET perf counters instead. I haven't tried that myself yet.
________________________________
From: Alvin Bruney [mailto:vapordan@h...]
Sent: Fri 8/15/2003 12:21 PM
To: CSha...@yahoogroups.com
Subject: Re: [C#.NET]Memory and GC
Sometime ago Bob asked a question about GC not giving back memory to the OS.
Eric said he would research this. Has there been a follow up response to
this issue?
Thanks
C# .NET!
---------------------------------------
http://Groups.Yahoo.com/group/CSharpNET
CSharpNET - C#.NET/C# Developers' Group
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
[Non-text portions of this message have been removed]
--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
"Tim Anderson" <tim...@hotmail.com> wrote in message
news:e8S3uZ90...@TK2MSFTNGP12.phx.gbl...
Many thanks for your efforts. The known problems I see in this thread are:
- issue with large object heap fixed from FX 1.0 SP2
- issue with Task Manager not reporting correctly
From what you said I thought there was another issue not yet fixed?
Tim
First, I'd try adding in (for testing purposes only) calls to
GC.Collect();
GC.WaitForPendingFinalizers();
before you start to fetch the next 100 URLs.
If that keeps your memory consumption constant, then it means that you could
be calling Dispose() on something to free it up. If it doesn't get better,
then it means that your code is holding onto the memory itself, or there is
a bug somewhere causing a problem (possible, but not terribly likely).
Second, I'd run the CLR profilers over the code and see what objects are
getting used. I think it would be most useful to take a heap snapshot after
you have done the cleanup, which should put you back to a base level. Then,
do another iteration of 100, get the same information, and compare. My guess
is that you are holding onto a reference somewhere that's keeping the data
around.
Third, you might want to change your dataset code so that it creates a new
dataset object each time around (rather than calling Clear()). It could be
that the dataset object is holding onto information. Or, you could use the
low-level database approach instead and skip the dataset altogether.
CLR Profiler:
http://www.microsoft.com/downloads/details.aspx?FamilyId=86CE6052-D7F4-4AEB-9B7A-94635BEEBDDA&displaylang=en
If you get really stuck, drop me a line at Eri...@microsoft.com
--
Eric Gunnerson
Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/
This posting is provided "AS IS" with no warranties, and confers no rights.
"Mahesh Prasad" <mahesh...@hotmail.com> wrote in message
news:%23bofVZU...@TK2MSFTNGP12.phx.gbl...
> Many thanks for your efforts. The known problems I see in this thread are:
>
> - issue with large object heap fixed from FX 1.0 SP2
> - issue with Task Manager not reporting correctly
Task manager correctly reports what it is designed to report (working set
and private bytes). It doesn't know anything about managed code though.
To see details about managed heap usage you should use perfmon.
> From what you said I thought there was another issue not yet fixed?
Could it be this one?
http://support.microsoft.com/default.aspx?scid=kb;en-us;833610
It's possible. A rather vague tech note, which is a bad sign.
Tim
protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory
}
As the Byte[] has a function scope, the 10K Byte Array was getting created
everytime the "SaveImageLocally()" function
was called. I did this with the belief that the GC will free up all the
memory once the the array goes out of the function scope.
But as I found out, it does not!!!
So to get around this problem, I made the "byte[] buffer" array a member of
my class, thus giving it an object scope, rather
than a function scope. In other words, I'm now reusing the buffer for all
the images.
I went thru tons of articles on .NET GC to understand how it works ...and
found lotsa interesting stuff, like object resurrection and
how objects that implement Finalize() do not get released even after the
first GC.
Bottom line : Even if technologies like Java and .NET promise to relieve you
from memory management woes, they have their
own set of quirks and problems. I'm going to be more careful about how I
write my code in .NET from now on ( maybe more
careful than when I was writing in C++ :-)
Thanks everyone for their time and help!
Mahesh
It certainly *should* do. For instance:
using System;
public class Test
{
static void Main()
{
for (int i=0; i < 10000; i++)
{
Generate10K();
}
}
static void Generate10K()
{
byte[] array = new byte[10000];
for (int i=0; i < array.Length; i++)
{
array[i]=(byte)i;
}
}
}
works fine on my box, certainly without taking 100Mb of memory!
In the code you posted, however, there *are* other problems/oddnesses:
o You're passing the WebResponse by reference for no reason
o You don't need to cast buffer.Length to an int - it's already an int
o You're not closing the streams if an exception is generated - use
a using block
o You don't need the "buffer=null;" at the end of the method
(You are also inconsistent in your variable names, but that's a
different matter.)
You may just have hit a bug in the garbage collector in this case, but
they're rare - I certainly wouldn't start changing the design of your
code just in case you hit a bug, as you'll only end up with other bugs
instead.
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Anyways this is a better design, as I'm reusing the buffer instead of
creating it nearly 20,000 times...thus relieving
GC of the collecting that many objects.
Thanks,
-Mahesh
In the code you posted, however, there *are* other problems/oddnesses:
> o You're passing the WebResponse by reference for no reason
> o You don't need to cast buffer.Length to an int - it's already an int
> o You're not closing the streams if an exception is generated - use
> a using block
> o You don't need the "buffer=null;" at the end of the method
"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.1a6bb5c1e...@msnews.microsoft.com...
But using a using statement makes it just that much simpler - you can't
accidentally get it wrong, basically.
> When you can't figure out why the memory doesn't release as expected, you
> tend to do try out various things like setting buffer=null, hoping this might
> just release memory.
Sure - I'd just take it out again as soon as I'd found that it didn't
work :)
> Anyways this is a better design, as I'm reusing the buffer instead of
> creating it nearly 20,000 times...thus relieving
> GC of the collecting that many objects.
Well, that depends - if it means that byte array ends up in a later
generation, it may well *not* be a better design. It doesn't sound like
it's really a logical part of your class.
> You'll get no argument from me here. Did I tell you how I found this was a
> .NET bug? You probably don't want to know - I'm bald and toothless because
> of this one.
>
> Anyways, it is slated for a fix so if you hang on, it shouldn't be that
> There is a tall discussion on c sharp yahoo groups as to where
> exactly the bug is located. I remember someone saying it was
> deep in the win32 mem code - meaning it was not easy to fix.
Nah... First I would say if it were in Win32, it should be very easy to fix
since there would be nothing magical about memory allocation in Win32 and
the algorithms should be comparatively simple. Second, the problem would
have been noticed a lot earlier since not only CLR applications would be
subject to it.
So my guess would be that "It's in Win32, not easy to fix" is the standard
answer to any of the following situations:
- we have no clue
- we have more urgent matters to attend to
- we don't care
I bet they have this phone/mail script on the wall that has an answer to any
reported problem with a couple of universally applicable
"send-'m-into-the-woods" kind of answers to be used in cases where there is
no acceptable answer yet.
Martin.
I'm not really clear on what this means.
I have similar symptoms and set MaxWorkingSet and the total process memory
stays low (the Gen 2 Heap size keeps going up). What does this imply?
thx
John
"Richard A. Lowe" <cha...@yumspamyumYahoo.com> wrote in message
news:urJuy1W0...@tk2msftngp13.phx.gbl...