From: Leonard Richardson <leona...@segfault.org>
Date: Sun, 18 Jan 2009 11:10:18 -0500
Local: Sun, Jan 18 2009 11:10 am
Subject: Re: BeautifulSoup choking on quotation mark typo
I thought I'd posted this to the list, but it was actually a private
email. This is my general stand on this kind of problem: Low-level HTML problems like this are not something I can fix.
I chose to switch to HTMLParser so that Beautiful Soup could run under
My plan for handling this is to make the underlying parser pluggable.
Basically, I want to get out of the business of writing parsers and
In the meantime, you have three options:
1. Pre-process the data so that HTMLParser can handle it.
Leonard
On Sun, Jan 18, 2009 at 7:40 AM, Jonathan
<jonathan.north.washing...@gmail.com> wrote:
> David, you should be aware of the following—straight from
> "You didn't write that awful page. You're just trying to get some data
> The whole reason not to use regexes is that there are always
> Christian, This appears to be a bug introduced with using HTMLParser
> --
> On Jan 5, 6:23 pm, "David Barnett" <daviebd...@gmail.com> wrote:
>> David
>> On Sun, Jan 4, 2009 at 10:08 PM, Christian <kreib...@gmail.com> wrote:
>> > Hi all,
>> > I have BS choking on this content ...
>> > <div align="left""><strong>Next page:</strong> [...]
>> > (note the double quotation marks) ... with a:
>> > File "/usr/lib/python2.5/site-packages/BeautifulSoup.py", line 1261,
>> > Is there an easy way to make BS tolerate this problem and soldier on?
>> > Thanks,
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||