Some website content causes BeautifulSoup to restart Python session

25 views
Skip to first unread message

Ryan Whalen

unread,
May 10, 2012, 6:50:46 PM5/10/12
to beautifulsoup
I just asked this question on Stack Overflow, but this might be a
better venue:

I'm fetching and parsing a medium-large quantity of webpages. I
noticed my script was spontaneously ending with a Python session
restart. Thus far it only seems to happen when I try to make soup out
of the nasa.gov page. i.e.:

import urllib2
from bs4 import BeautifulSoup

page=urllib2.urlopen('http://www.nasa.gov')
soup=BeautifulSoup(page)

=====================================RESTART=====================================


Does anyone know why this might be occurring and whether there's
anyway I can avoid it? It doesn't throw an exception or anything, the
session just restarts. This happens on two different machines,
although I'd be interested if it isn't reproducible by others (I'm
using Python 2.7.2 - Enthought Distribution)

Ryan Whalen

unread,
May 10, 2012, 7:12:43 PM5/10/12
to beautifulsoup
I've just tried to parse the same page with lxml and it seems to cause
the same spontaneous session restart. Interestingly, reading the page
and printing to the console seems to work fine.
Reply all
Reply to author
Forward
0 new messages