I would like to write a script which extracts the text content from similarly structured webpages (the documentation pages for Microsoft Visual Basic for Applications).
I understand Beautiful Soup is a standard tool designed precisely for this, i.e. that it could select a specific HTML node to extract text from.
However I was curious, what is the most pure, ubiquitous underlying tool for this application, just for the simple action of going to an HTML node and retrieving the text there? What is Beautiful Soup built on? I thought it might be Xpath, an XML parser, but so far it seems like Xpath is used inside HTML documents rather than outside it.
Thanks very much.