Hi, everyone!
I'm using BeautifulSoup4 to process some of my xml files. And I'm
processing around 80k web pages.
So, basically, I use os.walk to walk through a 'root' folder, and processes
all the xml files in that folder recursively.
And here's the problem:
In the method that does the work, I'm using these 3 lines of code:
print xmlData
soup = BeautifulSoup(xmlData, 'xml')
print soup
And I got that runtime error exactly on the 3908 file(as mentioned above, I
have to process 80k pages), I've tried that several times, and got the same
runtime error that tells me I have the maximum recursion depth exceeded.
And when I copied that file out to my mac, and it works perfectly for that
single file.
So, I'm wondering what is wrong with this? In my server, I'm running on
Ubuntu 12.04 with the BeautifulSoup version 4-4.1.3, python version 2.7.
BTW:
Here's the error log file:
Traceback (most recent call last):
File "/home/esdop/labrador/bin/reactor", line 72, in <module>
parseArgs()
File "/home/esdop/labrador/bin/reactor", line 54, in parseArgs
main(args)
File "/home/esdop/labrador/bin/reactor", line 66, in main
reactorObj.doReactorWork()
File "/home/esdop/labrador/butts/reactor/reactor_main.py", line 50, in
doReactorWork
self.processFilesRecursively(self.doWork)
File "/home/esdop/labrador/butts/reactor/reactor_main.py", line 102, in
processFilesRecursively
processFunction(root, fileName)
File "/home/esdop/labrador/butts/reactor/reactor_main.py", line 126, in
doWork
print soup
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 956, in __str__
return self.encode()
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 966, in encode
u = self.decode(indent_level, encoding, formatter)
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ __init__.py",
line 334, in decode
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
indent_contents, eventual_encoding, formatter)
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1074, in decode_contents
formatter))
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
indent_contents, eventual_encoding, formatter)
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1074, in decode_contents
formatter))
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
indent_contents, eventual_encoding, formatter)
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1074, in decode_contents
formatter))
...(repeating)
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1021, in decode
indent_contents, eventual_encoding, formatter)
File
"/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/ element.py",
line 1068, in decode_contents
for c in self:
RuntimeError: maximum recursion depth exceeded