From: Leonard Richardson <leona...@segfault.org>
Date: Mon, 30 Jul 2012 13:34:21 -0400
Local: Mon, Jul 30 2012 1:34 pm
Subject: Re: <tr"> tag error
I'm not 100% what you're asking, but my advice is to parse the markup
with lxml instead of html5lib. Here's how html5lib parses the markup you gave me: <html>
I'm guessing your problem has to do with the fact that html5lib loses
Here's how lxml parses the same markup:
<html>
Note that the <th> and <td> tags are preserved.
Leonard
On Mon, Jul 30, 2012 at 12:19 PM, Tom <boot...@gmail.com> wrote:
> Hey Leonard, > So Im now parsing with html5lib and it is working... However that <tr"> > tag.... it turns out I need the text from that, go figure. Typically that > tag looks like <tr class="even "> and Ive been getting the text from it > easily... However there are multiple instances where that <tr class="even "> > looks like this <tr"> I am not sure if its a server error or what but all > the data/text associated with that class is still there its just preceded by > a malformed tag.... Below is an example of a good <tr class="even "> VS. > <tr"> tag > Good: <tr class="even "><th scope=col><a
> is there anyway to fix or replace that malformed tag?
> I was looking around here in your documentation:
> Thanks,
> On Tuesday, July 24, 2012 10:16:18 AM UTC-4, Leonard Richardson wrote:
>> On Tue, Jul 24, 2012 at 9:22 AM, Tom <boot...@gmail.com> wrote:
>> > is this an instance where beautifulsoup can't parse a page and I need to
>> You should tell Beautiful Soup to use the lxml parser instead of
>> http://www.crummy.com/software/BeautifulSoup/bs4/doc/#other-parser-pr...
>> "HTMLParser.HTMLParseError: malformed start tag or
>> Leonard
> --
> To post to this group, send email to beautifulsoup@googlegroups.com.
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||