Dear Peter,
Thank you very much for your post for my problem.
Well, I have 2 examples here, where I am trying to just extract the texts (and I have only a BSoup/regex approach to it: not pure BSoup):
##### 1 ####
<div id="Type" class="Link"><span class="SType">SType001</span>, distilled: <span id="distiled" class="Amount">100gs</span></div>
##### 1 ####
Here, I wish to extract the texts "SType001" and "100gs".
##### 2 ####
<ul class="specsNA">
<li class=""> <span class="Ktype">KTypeNA</span></li><li class=""> <span class="Ltype">LTypeNA</span></li><li class=""> <span class="MType">MTypeNA</span></li><li class=""> <span class="NType">NTypeNA</span></li>
</ul>
##### 2 ####
Here, I wish to extract "KTypeNA", "LTypeNA", "MTypeNA" and "NTypeNA".
At the moment, what I am doing is using the div, id or div, id , class comands (using lambda as you have suggested), to extract what is possible, and then use regex (re.compile) to extract the text. However, this just does not look elegant to me at the moment (I am not a real programmer: I have a textile engineering degree..) and I was wondering if there was a better solution to this (I wish to parse 1000's of pages). Also, some posts on the net seem to suggest not to use regex to parse/extract information..But, I am unsure and therefore want to avoid using regex as much as possible.
Thank you for your time.
Best regards,
John.