Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Refreshing of urllib.urlopen()

341 views
Skip to first unread message

Michael Gruenstaeudl

unread,
Feb 3, 2010, 10:33:08 PM2/3/10
to pytho...@python.org
Hi,
I am fairly new to Python and need advice on the urllib.urlopen()
function. The website I am trying to open automatically refreshes
after 5 seconds and remains stable thereafter. With
urllib.urlopen().read() I can only read the initial but not the
refreshed page. How can I access the refreshed page via
urlopen().read()? I have already tried to intermediate with
time.sleep() before invoking .read() (see below), but this does not
work.

page=urllib.urlopen(url)
time.sleep(20)
htmltext=page.readlines()

Thanks,
Michael G.

Steve Holden

unread,
Feb 3, 2010, 11:02:35 PM2/3/10
to pytho...@python.org
When you say the page "refreshes" every 5 seconds, does it do so by
redirecting the browser to the same address with new content?

I suspect this is the case, because otherwise page.readlines() would not
return because it wouldn't have seen the "end of file" on the incoming
network stream.

You can find this out by examining the page's headers. If

page.headers['Refresh']

exists and has a value (like "5; url=http://<same url>") then browser
refresh is being used.

If that's so then the only way to access the content is to re-open the
URL and read the updated content.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

Gabriel Genellina

unread,
Feb 4, 2010, 1:02:29 AM2/4/10
to pytho...@python.org
En Thu, 04 Feb 2010 00:33:08 -0300, Michael Gruenstaeudl
<michael.gr...@mail.utexas.edu> escribi�:

> I am fairly new to Python and need advice on the urllib.urlopen()
> function. The website I am trying to open automatically refreshes after
> 5 seconds and remains stable thereafter. With urllib.urlopen().read() I
> can only read the initial but not the refreshed page. How can I access
> the refreshed page via urlopen().read()? I have already tried to
> intermediate with time.sleep() before invoking .read() (see below), but
> this does not work.
>
> page=urllib.urlopen(url)
> time.sleep(20)
> htmltext=page.readlines()

How does the page refresh itself? If using a <meta http-equiv="refresh"
...> tag, look at the url.

--
Gabriel Genellina

Nobody

unread,
Feb 4, 2010, 11:46:47 AM2/4/10
to
On Wed, 03 Feb 2010 21:33:08 -0600, Michael Gruenstaeudl wrote:

> I am fairly new to Python and need advice on the urllib.urlopen()
> function. The website I am trying to open automatically refreshes
> after 5 seconds and remains stable thereafter. With
> urllib.urlopen().read() I can only read the initial but not the
> refreshed page. How can I access the refreshed page via
> urlopen().read()? I have already tried to intermediate with
> time.sleep() before invoking .read() (see below), but this does not
> work.

In all probability, the server is instructing the browser to load a
different URL via either a Refresh: header or a <meta http-equiv="refresh">
tag in the page. You will have to retrieve that information then issue a
request for the new URL.

It might even be redirecting via JavaScript, in which case, you lose (it's
possible to handle this case, but it's difficult).

0 new messages