How should I interpret the stream I'm getting?
I'm using the following code:
URL u;
InputStream is = null;
DataInputStream dis;
String s;
try {
u = new URL("http://www.collegehumor.com:80/video:1674301");
is = u.openStream(); // throws an IOException
dis = new DataInputStream(new BufferedInputStream(is));
while ((s = dis.readLine()) != null) {
System.out.println(s);
}
}
catch (MalformedURLException mue) {
} catch (IOException ioe) {
} finally {
try {
is.close();
} catch (IOException ioe) {
}
} // end of 'finally' clause
} // end of main
On Oct 22, 1:11 am, mic...@gmail.com wrote:
> I am trying to read the text of a website using a URL object and a data
> stream
> It works well on CNN.com for example, but doesn't work well on:http://www.collegehumor.com:80/video:1674301
>
What makes you think it does not work?
> How should I interpret the stream I'm getting?
As HTML?
I don't get exactly what you want to do, but have you considered
Jakarta HttpClient?
--
Régis
Look as if that URL are returning its content GZIP'ed.
Try wrap the InputStream in a GZIPInputStream.
Arne
> http://www.collegehumor.com:80/video:1674301
>
> How should I interpret the stream I'm getting?
I guess it's a video stream , so you should read it as binary and pass
it to a media library if you want to show it.
> while ((s = dis.readLine()) != null) {
Last I checked, video formats were not line-oriented.
This source loads and displays (crudely) the web page
at that address.
<sscce>
import javax.swing.*;
import java.net.URL;
public class ShowURL {
public static void main(String[] args) {
String address = null;
if (args.length==0) {
address = JOptionPane.showInputDialog(null, "URL?");
} else {
address = args[0];
}
JEditorPane jep = null;
try {
URL url = new URL(address);
jep = new JEditorPane(url);
} catch(Exception e) {
jep = new JEditorPane();
jep.setText( e.toString() );
}
JScrollPane jsp = new JScrollPane(jep);
jsp.setPreferredSize(new java.awt.Dimension(400,300));
JOptionPane.showMessageDialog(null, jsp);
}
}
</sscce>
..so the data is readable, and it is a web-page.
Andrew T.
>
> Régis Décamps wrote:
>> On Oct 22, 1:11 am, mic...@gmail.com wrote:
>> > I am trying to read the text of a website using a URL object and a
>> data
>> > stream
>> > It works well on CNN.com for example, but doesn't work well
>> on:http://www.collegehumor.com:80/video:1674301
>> >
>>
>> What makes you think it does not work?
> The fact instead of normal HTML text I'm getting gibbrish like this:
> <?s?6²¿w¦? ??E?9 ¿$J´-e ?I|/N|¶?^s???$$1¦ ??l«???·? IQ²?v??¼d ? X` ?? ?~8?tr??? ?e??\~~????hm]??>????S??÷7 ??1?MB?4?B ?H×?>jD?e??@×???;÷v?'S??J @X&vV??¬?³ d?6??»#| ¿x?h
> ¯?,£ ?¶?o??n¨??8cq?¾Y-?F|y7?2??? ???3??, ?)o =·m
> ? RL?l¨?e6?I©7
>>
As another poster already said, this is gzip encoded.
When I do this sort of thing I just grab the data stream to a byte[] -
then take a look at the headers to see what the encoding is when I have
the whole message.
I found that it is necessary to search for the GZIP signature bytes
to locate the start of the gzip stream after the headers.
Bill