Am 27.08.2012 05:00, schrieb 孙鹏:
> Hi, everyone!
>
> I'm using BeautifulSoup4 to process some of my xml files. And I'm processing around 80k web pages.
>
> So, basically, I use os.walk to walk through a 'root' folder, and processes all the xml files in that folder recursively.
>
> And here's the problem:
>
> In the method that does the work, I'm using these 3 lines of code:
>
> print xmlData
> soup = BeautifulSoup(xmlData, 'xml')
> print soup
>
> And I got that runtime error exactly on the 3908 file(as mentioned above, I have to process 80k pages), I've tried that
> several times, and got the same runtime error that tells me I have the maximum recursion depth exceeded. And when I
> copied that file out to my mac, and it works perfectly for that single file.
>
> So, I'm wondering what is wrong with this? In my server, I'm running on Ubuntu 12.04 with the BeautifulSoup version
> 4-4.1.3, python version 2.7.
>
I had a similar problem some hours ago using version 3.2.1.
here is my solution
http://stackoverflow.com/questions/10118160/beautifulsoup-maximum-recursion-depth-reached/12156428#12156428
if you have nested tags with a depth of about 480 levels, and you want to convert this tag to string/unicode, you get
the RuntimeError maximum recursion depth reached. Every level needs two nested method calls and soon you hit the default
of 1000 nested python calls. You can raise this level, or you can use this helper. It extracts all text from the html
and displays it in a pre-environment:
def beautiful_soup_tag_to_unicode(tag):
try:
return unicode(tag)
except RuntimeError as e:
if not str(e).startswith('maximum recursion'):
raise
# If you have more than 480 level of nested tags you can hit the maximum recursion level
out=[]
for mystring in tag.findAll(text=True):
mystring=mystring.strip()
if not mystring:
continue
out.append(mystring)
return u'<pre>%s</pre>' % '\n'.join(out)
--
Thomas Guettler,
http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de