Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Accessing HTML via a URL?

0 views
Skip to first unread message

marty

unread,
Jan 21, 2003, 4:10:04 AM1/21/03
to
Is it possible to get access to the HTML referred to by a URL? The
description of the java.net.URL class tells me that the class "represents a
URL and allows the data referred to by the URL to be downloaded". This
sounds like what I'm attempting to do.

Here's my code:

URL url = new URL("http://www.myurl.com");
URLConnection conn = url.openConnection();
String content = conn.getContent().toString();
System.out.println(content);

However the call to getContent() returns the follwowing:

sun.net.www.MeteredStream@129206

which is, I assume, an object reference. This would be fair enough as
getContent has an Object return type, but the API spec tells me that
getContent() returns the URL contents - which I take to mean the content
found at that URL (the HTML I'm trying to access).

So...how do I get access to the HTML of a page?!

TIA


Michael Borgwardt

unread,
Jan 21, 2003, 4:38:01 AM1/21/03
to
marty wrote:
> Is it possible to get access to the HTML referred to by a URL? The
> description of the java.net.URL class tells me that the class "represents a
> URL and allows the data referred to by the URL to be downloaded". This
> sounds like what I'm attempting to do.
>
> Here's my code:
>
> URL url = new URL("http://www.myurl.com");
> URLConnection conn = url.openConnection();
> String content = conn.getContent().toString();
> System.out.println(content);

That won't work - you'll have to open an InputStream and read from that.

marty

unread,
Jan 21, 2003, 6:11:23 AM1/21/03
to

"Michael Borgwardt" <bra...@brazils-animeland.de> wrote in message
news:b0j48v$pd14a$1...@ID-161931.news.dfncis.de...

Why? Or should I say - how? Is this not the same thing (wrt the URL class)
as opening a URLConnection and calling getContent()?.


marty

unread,
Jan 21, 2003, 6:22:35 AM1/21/03
to

"Michael Borgwardt" <bra...@brazils-animeland.de> wrote in message
news:b0j48v$pd14a$1...@ID-161931.news.dfncis.de...

Sorted it out with...

URL url = new URL(myURL);
URLConnection conn = url.openConnection();
InputStream in = conn.getInputStream();
BufferedReader fromServer = new BufferedReader(new
InputStreamReader (in));
for(String str = null;(str = fromServer.readLine())
!= null;)
{
System.out.println(str);
}

Cheers for pointing me in the right direction.


Michael Borgwardt

unread,
Jan 21, 2003, 7:39:49 AM1/21/03
to
marty wrote:

> Sorted it out with...
>
> URL url = new URL(myURL);
> URLConnection conn = url.openConnection();
> InputStream in = conn.getInputStream();
> BufferedReader fromServer = new BufferedReader(new
> InputStreamReader (in));
> for(String str = null;(str = fromServer.readLine())
> != null;)
> {
> System.out.println(str);
> }

There's a problem with this: it assumes that the file will be in the platform default
encoding. If it isn't, the output will often be rubbish.

Change the declaration of the BufferedReader to:
BufferedReader fromServer = new BufferedReader(new InputStreamReader (in,
conn.getContentEncoding()));

marty

unread,
Jan 21, 2003, 7:58:44 AM1/21/03
to

"Michael Borgwardt" <bra...@brazils-animeland.de> wrote in message
news:3E2D3F95...@brazils-animeland.de...

Thanks for that!


marty

unread,
Jan 21, 2003, 8:29:38 AM1/21/03
to

"Michael Borgwardt" <bra...@brazils-animeland.de> wrote in message
news:3E2D3F95...@brazils-animeland.de...

That seems to introduce the following exception:

java.lang.NullPointerException
at sun.io.Converters.getConverterClass(Converters.java:73)
at sun.io.Converters.newConverter(Converters.java:133)
at
sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:62)
at java.io.InputStreamReader.<init>(InputStreamReader.java:73)


Michael Borgwardt

unread,
Jan 21, 2003, 9:15:52 AM1/21/03
to
marty wrote:
>>Change the declaration of the BufferedReader to:
>>BufferedReader fromServer = new BufferedReader(new InputStreamReader (in,
>>conn.getContentEncoding()));
>
>
> That seems to introduce the following exception:
>
> java.lang.NullPointerException
> at sun.io.Converters.getConverterClass(Converters.java:73)
> at sun.io.Converters.newConverter(Converters.java:133)
> at
> sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:62)
> at java.io.InputStreamReader.<init>(InputStreamReader.java:73)

Looks like you're already hitting the quite serious problem of a HTML file
using an encoding that your Java installation doesn't understand.
If you insert a "System.out.println(conn.getContentEncoding())", what
does it print?

marty

unread,
Jan 21, 2003, 10:04:59 AM1/21/03
to

"Michael Borgwardt" <bra...@brazils-animeland.de> wrote in message
news:b0jkht$q83ca$2...@ID-161931.news.dfncis.de...
It outputs the string value "null".


Michael Borgwardt

unread,
Jan 21, 2003, 10:28:41 AM1/21/03
to
marty wrote:
>>Looks like you're already hitting the quite serious problem of a HTML file
>>using an encoding that your Java installation doesn't understand.
>>If you insert a "System.out.println(conn.getContentEncoding())", what
>>does it print?
>>
>
> It outputs the string value "null".

Ah. Then it seems the server doesn't specify an encoding. In that case, you
can't really do much; just check for null and assume "ISO-8859-1" in that case.
You could read the header of the HTML file and see whether it declares an
encoding, but that would be pretty difficult.

marty

unread,
Jan 21, 2003, 10:43:57 AM1/21/03
to

"Michael Borgwardt" <bra...@brazils-animeland.de> wrote in message
news:b0joqe$q46hd$1...@ID-161931.news.dfncis.de...

Thanks for your help Michael.


0 new messages