All of a suddenly, BeautifulSoup, code that has worked for years, can't extract tags?

6 views
Skip to first unread message

ssteinerX

unread,
Oct 5, 2009, 12:42:37 AM10/5/09
to beautifulsoup
Hi!

Long time user, first time poster...I'm kind of freaking out here...

I've got a project on a Windows server, just trying to extract <a
href=...> tags with:

A_TAG_STRAINER = SoupStrainer('a')

a_tags = BeautifulSoup(content, parseOnlyThese=A_TAG_STRAINER)

Only, even though the content's got a bunch of a_tags, a_tags is
coming up empty.

The same code has worked for eons on all of my Linux sites which is
why I'm even mentioning that it's a Windows site.

I've looked at the "content" passed into BeautifulSoup and it looks,
to the naked eye, fine.

Anyone have any idea what I should look for next?

Of course, this is when I'm supposed to be verifying links on a new
site being deployed for a client; only waiting for the go-ahead from
us...

Thanks!

S

VanB

unread,
Oct 11, 2009, 8:37:04 PM10/11/09
to beautifulsoup
I'm having some parser errors and came across this note. Maybe it's
causing the problem you're experiencing...

http://www.crummy.com/software/BeautifulSoup/3.1-problems.html

In short, maybe you are using 3.0+ parser on your windows machine
while using an older version on your linux machines. This link
explains the problem and your option to make it work again is to
downgrade to an earlier version and run in python 2.6 or earlier....
no python 3.0 which dropped the SGML parser older versions used.

sste...@gmail.com

unread,
Oct 11, 2009, 8:51:15 PM10/11/09
to beauti...@googlegroups.com

On Oct 11, 2009, at 8:37 PM, VanB wrote:

>
> I'm having some parser errors and came across this note. Maybe it's
> causing the problem you're experiencing...
>
> http://www.crummy.com/software/BeautifulSoup/3.1-problems.html
>
> In short, maybe you are using 3.0+ parser on your windows machine
> while using an older version on your linux machines. This link
> explains the problem and your option to make it work again is to
> downgrade to an earlier version and run in python 2.6 or earlier....
> no python 3.0 which dropped the SGML parser older versions used.

While part of this may be valid, I made the same mistake.

< 3.0 uses old Python 2.x SGML parser

3.1+ uses newer, less forgiving (but available in Python 3) parser

S

Reply all
Reply to author
Forward
0 new messages