Efficient way to unpack an HTML table?

51 views
Skip to first unread message

OlyDLG

unread,
Oct 16, 2016, 8:36:47 PM10/16/16
to beautifulsoup
Hi, all.  I have a 2000 row x 19 col HTML table from which I need to extract the text, preserving the table-like structure (I'm loading it into a NumPy recarray).

contents = [[item.text for item in row.find_all('td')] for row in table.find_all('tr')]

(where table has been extracted using soup.find('table')) works, I'm simply wondering if there's a more "Soup-onic" way.

Thanks!

OlyDLG

unread,
Oct 19, 2016, 11:45:05 PM10/19/16
to beautifulsoup
Just a little update: I've switched to loading the nested list comprehension result into a pandas DataFrame, and that's working well.
Reply all
Reply to author
Forward
0 new messages