memory error loading a large file into beautifulStoneSoup

259 views
Skip to first unread message

amadain

unread,
Dec 11, 2009, 5:20:41 AM12/11/09
to beautifulsoup
I am trying to parse data in a large xml event log. The log has
150000+ events. When I load it into beautifulSoup I get the following
error:

Traceback (most recent call last):
File "BeautifulCount.py", line 16, in ?
events=BeautifulStoneSoup(fTot)
File "/usr/lib/python2.3/site-packages/BeautifulSoup.py", line 1141,
in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/python2.3/site-packages/BeautifulSoup.py", line 1174,
in _feed
markup = fix.sub(m, markup)
MemoryError

fTot is the concatenation of 15 xml event logs each with 10000 events
in them.

Does BeautifulStoneSoup have limitations on the size of the data it
can parse? What are the upper limits? I will need to parse al lot more
than just 150000 events in the future (telecoms).

Thanks in advance

A

Leonard Richardson

unread,
Dec 11, 2009, 7:55:06 AM12/11/09
to beauti...@googlegroups.com
Beautiful Soup is a DOM-style parser that builds a memory structure
proportional in size to the original document. To parse arbitrarily
large documents you need to use an event-based parser like the one in
xml.sax.

Leonard
> --
>
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>
>
>
Reply all
Reply to author
Forward
0 new messages