Anyway, the header of an HTTP response ends when you have "\r\n\r\n".
BufferedReader's readLine treats that as two lines since it considers
"\r\n" to be a line terminating character. Since it also strips off
the line terminating characters, readLine should return the second line
as "".
Per that, I've written a program that will loop, continuously, until ""
is encountered. Unfortunately, "" never appears to be encountered and
thus I have an infinite loop.
Here's my code:
import java.net.*;
import java.io.*;
public class HttpRequestor
{
public static void main(String[] args) {
try {
Socket sock = new Socket("www.google.com", 80);
String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
sock.getOutputStream().write(httpRequest.getBytes());
BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));
String line, output = "";
while (text.readLine() != "");
while ((line = text.readLine()) != null) {
System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
To confirm that I was indeed getting "" back from readLine, I wrote the
following:
import java.net.*;
import java.io.*;
public class HttpRequestor
{
public static void main(String[] args) {
try {
Socket sock = new Socket("www.google.com", 80);
String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
sock.getOutputStream().write(httpRequest.getBytes());
BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));
String line, output = "";
while ((line = text.readLine()) != null) {
System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
This shows that "" is indeed being returned by readLine. So why
doesn't the while loop in the first program terminate when "" is
received?
Any insights would be appreciated - thanks!
Because you compare strings with == (identity) instead with equals()
(equivalence).
robert
[snip most of the code]
> Socket sock = new Socket("www.google.com", 80);
I recommend against using google as your test server. Google does some
funky stuff when it detects that Java is connecting to it, which may give
you unexpected results.
- Oliver
Good suggestion except for two things, He isn't using Java's URL API,
which is what's responsible for setting the User-Agent string. Second,
you can override the User-Agent string, and google couldn't possible
know the difference.
In any case, his problem is that the OP is comparingwith line == "",
when he should use line.equals(""), or better yet line.size() == 0
HTH,
Daniel.
> Oliver Wong wrote:
> > I recommend against using google as your test server. Google does
> > some funky stuff when it detects that Java is connecting to it, which
> > may give you unexpected results.
[...]
> Good suggestion except for two things, He isn't using Java's URL API,
> which is what's responsible for setting the User-Agent string. Second,
> you can override the User-Agent string, and google couldn't possible
> know the difference.
I agree with Oliver's advice. Google is perfectly at liberty to treat requests
differently depending on how they /appear/ to have been submitted.
If I were them I would group requests into at least three categories: ones that
appear to be legit (as far as we can tell from the various meta-info in a
request); those that appear to come from frequently abused clients (such as the
Java stuff); and those where we can't tell much. I would be less aggressive
about -- say -- shutting off an over-eager client IP address if the requests
appeared to be from a normal browser than if they appeared to come from
uncontrolled code. And I'd put the "can't tell" ones somewhere in the middle.
But the bottom line is not that Google /can/ treat requests differently
depending on apparently immaterial meta stuff, but that it /does/ do so --
which makes it a very poor example domain for a beginner (to HTTP) to test
against.
-- chris
Okay, while my point was that you can "trick" google into thinking that
it is probably a legit client, your point is well taken.
I suppose a good way to learn HTTP is to set up a webserver in your own
development environment (such as apache, resin, etc...), and use it
instead of a third party website. That way you also have control over
the content being produced.
- Daniel.