I'm using SoupStrainer and can't find the title with
http://www.digitalpoint.com/
All of the main page is on one line without new lines.
excerpt:
tags = []
try:
links = SoupStrainer(u'title')
for tag in BeautifulSoup(result.content,
parseOnlyThese=links):
tags.append(tag)
if DEBUG:
logging.info('tag is ' + str(tag))
# result.content comes from Urlfetch, which Google App Engine uses
<! (C) Copyright 1996-2006 Digital Point Solutions - No portion of
this site may be reproduced in ANY form.><html><head><title>Digital
Point Solutions</title><meta name="description" content="Offering
business software packages and free online tools."><meta
name="keywords" content=""><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" /> <link rel="StyleSheet"
type="text/css" href="/style.css"></head><body background="./images/
background_1.gif">...deleted
I've tried trimming down the size of the file but that has the
(presumably) bad effect of leaving off the closing tags.
So I'm going to try to use a regular expression to trim off anything
in front of the <html>