Maximum recursion depth exceeded

207 views
Skip to first unread message

Hector Liu

unread,
Nov 19, 2010, 2:59:40 PM11/19/10
to beautifulsoup
Beautifulsoup is powerful but I faced some exceptions when handling
some webpages. The most recent one is "Maximum recursion depth
exceeded" when call prettify()

It keep calling:
s.append(c.__str__(encoding, prettyPrint, indentLevel))

As I am using a offline page so I cannot show the webpage content
here.

Anyone have idea about this?

--
Btw, my soup version is 3.0.8.1.

Andrew Spiers

unread,
Nov 19, 2010, 10:38:40 PM11/19/10
to beauti...@googlegroups.com
On Sat, Nov 20, 2010 at 6:59 AM, Hector Liu <hunter...@gmail.com> wrote:
> Beautifulsoup is powerful but I faced some exceptions when handling
> some webpages. The most recent one is "Maximum recursion depth
> exceeded" when call prettify()
>
> It keep calling:
> s.append(c.__str__(encoding, prettyPrint, indentLevel))
>
It is probably the recursion limit set in python. If you've got the
memory to spare you can change this, check out the
sys.setrecursionlimit function of the sys module:
http://docs.python.org/library/sys.html#sys.setrecursionlimit

Hector Liu

unread,
Nov 20, 2010, 4:58:24 AM11/20/10
to beautifulsoup
Thank you for your suggestion.

I examine that page carefully and I found that the page is quite bad
cuz there around 2000 lines of <li>, but without an closing tag.

like this:
<li><a href="http://www.swcruise.com/store/?
furie=grassiness">Grassiness</a>
<li><a href="http://www.kiop.ru/addfav.php?furie=glossiest">Glossiest</
a>

So the recursion is really to deep. And I set the recursion limit to
3000 and still can only parse half of this page, there's no memory
left for me to add the limit any more. But I think this page is too
bad to be prettified

---
Btw, after handling 30,000 pages, 320 of them cannot be souped, which
is 1% of them. The performance is good but I still hope some
improvement can be made. I will check out the reasons of these pages
after I run my program again.

On Nov 20, 11:38 am, Andrew Spiers <7and...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages