I am interested in reading the text of a web page and parsing it.
After searching on this newgroup I decided to use the following:
******************************* START OF CODE ************************
String sTemp = "http://cgi3.igl.net/cgi-bin/ladder/teamsql/team_view.cgi?ladd=teamknights&num=238&showall=1";
WebRequest myWebRequest = WebRequest.Create(sTemp);
WebResponse myWebResponse = myWebRequest.GetResponse();
Stream myStream = myWebResponse.GetResponseStream();
// default encoding is utf-8
StreamReader SR = new StreamReader( myStream );
Char[] buffer = new Char[2048];
// Read 256 charcters at a time.
int count = SR.Read( buffer, 0, 2000 );
//while (count > 0)
//{
// do some processing - may read all or part
// count = SR.Read(buffer, 0, 2000);
//}
SR.Close(); // Release the resources
myWebResponse.Close();
******************************* END OF CODE ************************
This code should look very familiar because it is all over the
newsgroup and Microsoft support help pages.
The web page has a big table on it and it takes a while to download
(even with a cable modem).
What I observe is the following. If I open and read all the data
(i.e.
until count > 0 fails, then stepping over SR.Close() execution time is
immediate. If I read only 2000 bytes as the above example shows, when
I step over SR.Close() it takes a long time (for me around 10-15
seconds). This may be a coincidence but it seems to take the same
amount of time as if I was reading all of the data. At this point
I am starting to believe that SR.Close() does not abort reading until
the entire web page has been recieved. This is not desired and in
fact I parse the data and desire to terminate loading because the
entire process is so slow and not necessary all of the time.
Does anyone know how to terminate the loading of the page so I can
eliminate the delay? I had implemented this in C++ with MFC using
CInternetSession.OpenURL() and did not have this problem.
Thanks in advance.
Todd
Sami
www.capehill.net
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
> Maybe you should take some programming classes.
Hey! It's an arrogant spammer :-P
You can simply cut and paste the code into the button click event of a
simple windows app. Put a breakpoint on SR.Close(). If you then step
over this statement you will observe the delay I am talking about. If
you uncomment the code to read the entire windows form, stepping over
this statement will be immediate.
Any constructive comments would be greatly appreciated. I'm sure that
someone out there has the answer to my question.
Thanks in advance.
I doubt that, as the code doesn't do what it advertises ;-)
Char[] buffer = new Char[2048];
// Read 256 charcters at a time.
int count = SR.Read( buffer, 0, 2000 );
Why a 2 kB buffer, when you're supposedly reading only 256 chars, but you're
specifying 2000 chars for the Read() call?
> The web page has a big table on it and it takes a while to download
> (even with a cable modem).
>
> What I observe is the following. If I open and read all the data
> (i.e.
> until count > 0 fails, then stepping over SR.Close() execution time is
> immediate. If I read only 2000 bytes as the above example shows, when
> I step over SR.Close() it takes a long time (for me around 10-15
> seconds). This may be a coincidence but it seems to take the same
> amount of time as if I was reading all of the data.
Well, this particular page is an insane 6 MB large... the web server does
not help the client either, as there's no Content-Length header provided,
just Connection: close:
HTTP/1.1 200 OK
Date: Sat, 10 Apr 2004 10:20:31 GMT
Server: Apache/1.3.24 (Unix) mod_throttle/3.1.2 PHP/4.2.0
Connection: close
Content-Type: text/html
Even more interestingly, I cannot even download the entire page at all...
neither WebClient nor WebRequest/WebResponse are able to download that
beast. Both stop downloading at the exact same position -- I guess the
underlying TCP stream is prematurely closed. This must be some WinInet
default behaviour (quirk?), as the same thing happens to me when I download
the page using some ancient old Visual J++ code that uses plain TCP. I think
I'll write some plain HTTP client using System.Net.Sockets and see what
happens.
(Note: If the web server returns a Content-Length header, downloading the
page works just fine.)
[...]
> Does anyone know how to terminate the loading of the page so I can
> eliminate the delay? I had implemented this in C++ with MFC using
> CInternetSession.OpenURL() and did not have this problem.
Use asynchronous I/O -- see WebRequest.Abort(),
WebResponse.BeginGetResponse(), and WebResponse.EndGetResponse().
Cheers,
--
Joerg Jooss
joerg...@gmx.net
I'm sorry about the comment. I believe the comment for 256 bytes
belonged to the original example which I copied the code. The 2000
byte buffer is the actual size I was using the in the program from
which I derived the sample code to illustrate my problem in this
newsgroup.
I did not notice that the entire page of data does not download. That
is a good catch. The code I wrote using Studio 6/C++ does not have
that problem (see original post).
I was able to implement the Asynchronous approach using
WebResponse.BeginGetResponse(), and WebResponse.EndGetResponse() that
you suggested and note that it also does not download the entire page
of data as you have described.
What my program does is start to download the data up to a point and
then close the connection. The reason for closing the connection is
because it takes so long to get the entire amount of data and there is
not always a need to get all of it. The implementation I have using
Studio 6/C++ does this and works perfectly. It is very dissapointing
that .Net/C# does not work.
Were you able to get better results using the socket approach?
How about some of you Microsoft gurus taking a look into this problem
and give answers to the following two questions:
1) What do you do to download the entire page of data.
2) What can you do to close the connection with zero delay (after
reading 1 or more 2000 byte buffers of data).
It is easy to set up this experiment by using cut paste with the
sample code I gave and putting it into a butten event of a simple
windows app.
Thanks in advance!
"Joerg Jooss" <joerg...@gmx.net> wrote in message news:<eigjwru...@tk2msftngp13.phx.gbl>...
It might be helpful if you could post a code snippet of the original MFC
code. It could be doing something on the WinInet level.
> I was able to implement the Asynchronous approach using
> WebResponse.BeginGetResponse(), and WebResponse.EndGetResponse() that
> you suggested and note that it also does not download the entire page
> of data as you have described.
So it's not just me. I got kind of scared when I realized that my 18 month
old code wasn't that perfect...
> What my program does is start to download the data up to a point and
> then close the connection. The reason for closing the connection is
> because it takes so long to get the entire amount of data and there is
> not always a need to get all of it. The implementation I have using
> Studio 6/C++ does this and works perfectly. It is very dissapointing
> that .Net/C# does not work.
I'm not entirely sure it's a bug in the .NET FCL. As I said, I have a '00
vintage Visual J++ application that has exactly the same defect. But after
digging through both the MFC and Rotor source code, I can't see anything
dodgy or special happening there
:-(
> Were you able to get better results using the socket approach?
Didn't have time to do that so far, but I'll give it a shot.
OK, now I'm officially confused.
I just ran a slightly modified sample application from Inside Visual C++
(4th Ed), and that behaves exactly like the .NET code. That web server must
be doing something dodgy.
1) If you cut and paste the URL into Internet Explorer it reads to the
end of the page.
2) I have included a simple quick C++ code snippet that also reads to
the end of the page. This code was run using C++ .Net (not VS 6) but
it also worked in VS 6. Note that if you put a breakpoint on "if
(m_pfile != NULL) line, you will see the last buffer worth of data in
one of the two buffers.
.h file:
#include <afxinet.h> // needed for CInternetSession
private:
CInternetSession m_session;
CStdioFile *m_pfile;
unsigned int m_buffer1_size;
char *m_data_buffer;
char m_data_buffer1[2048];
char m_data_buffer2[2048];
.cpp file:
m_pfile = m_session.OpenURL("http://cgi3.igl.net/cgi-bin/ladder/teamsql/team_view.cgi?ladd=teamknights&num=238&showall=1",1,INTERNET_FLAG_DONT_CACHE
| INTERNET_FLAG_TRANSFER_ASCII);
m_data_buffer = m_data_buffer1;
while ((m_buffer1_size = m_pfile->Read(m_data_buffer, 2000)) != 0) {
if (m_data_buffer == m_data_buffer1)
m_data_buffer = m_data_buffer2;
else
m_data_buffer = m_data_buffer1;
}
if (m_pfile != NULL) {
m_session.Close();
m_pfile->Close(); // need these to eliminate memory leaks!!
delete m_pfile; // need these to eliminate memory leaks!!
}
It seems like there should be a similar solution for c#.
Thanks in advance!
"Joerg Jooss" <joerg...@gmx.net> wrote in message news:<#VqKi0yI...@TK2MSFTNGP10.phx.gbl>...