example is a 1.6M file,
i tried to use beautifulsoup to parse it,
soup = Beautifulsoup(html_path, 'lxml')
everthing is ok if html is not very large, like this file
howerver, if html is too large, and have many trees in every node.
it may use great memory.
i have tested a 200M size file like tree struct of example file,it costed about 4G memory when beautifulsoup.
to solve this problem, i wan to use beautifulsoup to parser the html file page by page or node by node, so memory use will not be too high.
how to solve it?