Here's my code:
URL url = new URL("http://www.myurl.com");
URLConnection conn = url.openConnection();
String content = conn.getContent().toString();
System.out.println(content);
However the call to getContent() returns the follwowing:
sun.net.www.MeteredStream@129206
which is, I assume, an object reference. This would be fair enough as
getContent has an Object return type, but the API spec tells me that
getContent() returns the URL contents - which I take to mean the content
found at that URL (the HTML I'm trying to access).
So...how do I get access to the HTML of a page?!
TIA
That won't work - you'll have to open an InputStream and read from that.
Why? Or should I say - how? Is this not the same thing (wrt the URL class)
as opening a URLConnection and calling getContent()?.
Sorted it out with...
URL url = new URL(myURL);
URLConnection conn = url.openConnection();
InputStream in = conn.getInputStream();
BufferedReader fromServer = new BufferedReader(new
InputStreamReader (in));
for(String str = null;(str = fromServer.readLine())
!= null;)
{
System.out.println(str);
}
Cheers for pointing me in the right direction.
> Sorted it out with...
>
> URL url = new URL(myURL);
> URLConnection conn = url.openConnection();
> InputStream in = conn.getInputStream();
> BufferedReader fromServer = new BufferedReader(new
> InputStreamReader (in));
> for(String str = null;(str = fromServer.readLine())
> != null;)
> {
> System.out.println(str);
> }
There's a problem with this: it assumes that the file will be in the platform default
encoding. If it isn't, the output will often be rubbish.
Change the declaration of the BufferedReader to:
BufferedReader fromServer = new BufferedReader(new InputStreamReader (in,
conn.getContentEncoding()));
Thanks for that!
That seems to introduce the following exception:
java.lang.NullPointerException
at sun.io.Converters.getConverterClass(Converters.java:73)
at sun.io.Converters.newConverter(Converters.java:133)
at
sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:62)
at java.io.InputStreamReader.<init>(InputStreamReader.java:73)
Looks like you're already hitting the quite serious problem of a HTML file
using an encoding that your Java installation doesn't understand.
If you insert a "System.out.println(conn.getContentEncoding())", what
does it print?
Ah. Then it seems the server doesn't specify an encoding. In that case, you
can't really do much; just check for null and assume "ISO-8859-1" in that case.
You could read the header of the HTML file and see whether it declares an
encoding, but that would be pretty difficult.
Thanks for your help Michael.