Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
'print soup' got 'RuntimeError: maximum recursion depth exceeded'
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
孙鹏  
View profile  
 More options Aug 26 2012, 11:00 pm
From: 孙鹏 <voidmain1313...@gmail.com>
Date: Sun, 26 Aug 2012 20:00:37 -0700 (PDT)
Local: Sun, Aug 26 2012 11:00 pm
Subject: 'print soup' got 'RuntimeError: maximum recursion depth exceeded'

Hi, everyone!

I'm using BeautifulSoup4 to process some of my xml files. And I'm
processing around 80k web pages.

So, basically, I use os.walk to walk through a 'root' folder, and processes
all the xml files in that folder recursively.

And here's the problem:

In the method that does the work, I'm using these 3 lines of code:

print xmlData
soup = BeautifulSoup(xmlData, 'xml')
print soup

And I got that runtime error exactly on the 3908 file(as mentioned above, I
have to process 80k pages), I've tried that several times, and got the same
runtime error that tells me I have the maximum recursion depth exceeded.
And when I copied that file out to my mac, and it works perfectly for that
single file.

So, I'm wondering what is wrong with this? In my server, I'm running on
Ubuntu 12.04 with the BeautifulSoup version 4-4.1.3, python version 2.7.

BTW:
Here's the error log file:

Traceback (most recent call last):
  File "/home/esdop/labrador/bin/reactor", line 72, in <module>
    parseArgs()
  File "/home/esdop/labrador/bin/reactor", line 54, in parseArgs
    main(args)
  File "/home/esdop/labrador/bin/reactor", line 66, in main
    reactorObj.doReactorWork()
  File "/home/esdop/labrador/butts/reactor/reactor_main.py", line 50, in
doReactorWork
    self.processFilesRecursively(self.doWork)
  File "/home/esdop/labrador/butts/reactor/reactor_main.py", line 102, in
processFilesRecursively
    processFunction(root, fileName)
  File "/home/esdop/labrador/butts/reactor/reactor_main.py", line 126, in
doWork
    print soup
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 956, in __str__
    return self.encode()
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 966, in encode
    u = self.decode(indent_level, encoding, formatter)
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ __init__.py",
line 334, in decode
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
    indent_contents, eventual_encoding, formatter)
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1074, in decode_contents
    formatter))
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
    indent_contents, eventual_encoding, formatter)
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1074, in decode_contents
    formatter))
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
    indent_contents, eventual_encoding, formatter)
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1074, in decode_contents
    formatter))

...(repeating)

  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
    indent_contents, eventual_encoding, formatter)
  File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1068, in decode_contents
    for c in self:
RuntimeError: maximum recursion depth exceeded


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Guettler  
View profile  
 More options Aug 28 2012, 6:03 am
From: Thomas Guettler <h...@tbz-pariv.de>
Date: Tue, 28 Aug 2012 12:03:36 +0200
Local: Tues, Aug 28 2012 6:03 am
Subject: Re: 'print soup' got 'RuntimeError: maximum recursion depth exceeded'
Am 27.08.2012 05:00, schrieb 孙鹏:

I had a similar problem some hours ago using version 3.2.1.

here is my solution
http://stackoverflow.com/questions/10118160/beautifulsoup-maximum-rec...

if you have nested tags with a depth of about 480 levels, and you want to convert this tag to string/unicode, you get
the RuntimeError maximum recursion depth reached. Every level needs two nested method calls and soon you hit the default
of 1000 nested python calls. You can raise this level, or you can use this helper. It extracts all text from the html
and displays it in a pre-environment:

def beautiful_soup_tag_to_unicode(tag):
     try:
         return unicode(tag)
     except RuntimeError as e:
         if not str(e).startswith('maximum recursion'):
             raise
         # If you have more than 480 level of nested tags you can hit the maximum recursion level
         out=[]
         for mystring in tag.findAll(text=True):
             mystring=mystring.strip()
             if not mystring:
                 continue
             out.append(mystring)
         return u'<pre>%s</pre>' % '\n'.join(out)

--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »