How can I use beautifulsoup to parse a big html, about 200MB?

36 views

Skip to first unread message

unread,

Oct 26, 2016, 7:41:34 AM10/26/16

to beautifulsoup

example is a 1.6M file,

i tried to use beautifulsoup to parse it,

soup = Beautifulsoup(html_path, 'lxml')

everthing is ok if html is not very large, like this file

howerver, if html is too large, and have many trees in every node.

it may use great memory.

i have tested a 200M size file like tree struct of example file,it costed about 4G memory when beautifulsoup.

to solve this problem, i wan to use beautifulsoup to parser the html file page by page or node by node, so memory use will not be too high.

how to solve it?

369-378.html

Reply all

Reply to author

Forward

0 new messages