I'm not 100% what you're asking, but my advice is to parse the markup
with lxml instead of html5lib. Here's how html5lib parses the markup
you gave me:
I'm guessing your problem has to do with the fact that html5lib loses
Here's how lxml parses the same markup:
Note that the <th> and <td> tags are preserved.
On Mon, Jul 30, 2012 at 12:19 PM, Tom <boot...@gmail.com> wrote:
> Hey Leonard,
> So Im now parsing with html5lib and it is working... However that <tr">
> tag.... it turns out I need the text from that, go figure. Typically that
> tag looks like <tr class="even "> and Ive been getting the text from it
> easily... However there are multiple instances where that <tr class="even ">
> looks like this <tr"> I am not sure if its a server error or what but all
> the data/text associated with that class is still there its just preceded by
> a malformed tag.... Below is an example of a good <tr class="even "> VS.
> <tr"> tag
> Good: <tr class="even "><th scope=col><a
> is there anyway to fix or replace that malformed tag?
> I was looking around here in your documentation:
> On Tuesday, July 24, 2012 10:16:18 AM UTC-4, Leonard Richardson wrote:
>> On Tue, Jul 24, 2012 at 9:22 AM, Tom <boot...@gmail.com> wrote:
>> > is this an instance where beautifulsoup can't parse a page and I need to
>> You should tell Beautiful Soup to use the lxml parser instead of
>> "HTMLParser.HTMLParseError: malformed start tag or
> To post to this group, send email to email@example.com.
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.