How can I use beautifulsoup to parse a big html, about 200MB?

36 views
Skip to first unread message

bingch...@foxmail.com

unread,
Oct 26, 2016, 7:41:34 AM10/26/16
to beautifulsoup
example is a 1.6M file, 
i tried to use beautifulsoup to parse it,
soup = Beautifulsoup(html_path, 'lxml')
everthing is ok if html is not very large, like this file
howerver, if html is too large, and have many trees in every node.
it may use great memory.
i have tested a 200M size file like tree struct of example file,it costed about 4G memory when beautifulsoup.
to solve this problem, i wan to use beautifulsoup to parser the html file page by page or node by node, so memory use will not be too high.
how to solve it?
369-378.html
Reply all
Reply to author
Forward
0 new messages