Hi Everyone,
I am pretty new to BeautifulSoup but managed to get a scraper for NLRB filings [1] working after realizing that the lxml parser was required to properly parse the tables.
Unfortunately, I found that there is a prohibitive memory leak in the lxml calls when I parse thousands of documents in a loop in a single script.
Is there a way to configure BeautifulSoup to set 'collect_ids=False' in the lxml parser construction to sacrifice a bit of runtime for decreased memory usage?
With Thanks,
Jack Poulson