Using FeedController get_feed_data, mysterious "return unicode(value, encoding) type error: function takes exactly 5 arguments (1 given)"

13 views
Skip to first unread message

buffalob

unread,
Jun 26, 2007, 10:06:07 PM6/26/07
to TurboGears
I'm using FeedController with get_feed_data, successfully for the most
part, but very intermittently I get the following mysterious crash
(and I say mysterious because my code does not directly call anything
listed below, so I've no idea why the "5 args expected, 1 provided"
should be happening).

I invoke from my web browser as follows:

http://localhost:8080/feed/rss2.0

...and then here's what happens (and I'll post my code in a response
momentarily):

500 Internal error

The server encountered an unexpected condition which prevented it from
fulfilling the request.

Page handler: <bound method Feed.rss2_0 of <hello.controllers.Feed
object at 0x01300310>>
Traceback (most recent call last):
File "c:\python24\lib\site-packages\CherryPy-2.2.1-py2.4.egg\cherrypy
\_cphttptools.py", line 105, in _run
self.main()
File "c:\python24\lib\site-packages\CherryPy-2.2.1-py2.4.egg\cherrypy
\_cphttptools.py", line 254, in main
body = page_handler(*virtual_path, **self.params)
File "<string>", line 3, in rss2_0
File "c:\python24\lib\site-packages\TurboGears-1.0.2.2-py2.4.egg
\turbogears\controllers.py", line 334, in expose
output = database.run_with_transaction(
File "<string>", line 5, in run_with_transaction
File "c:\python24\lib\site-packages\TurboGears-1.0.2.2-py2.4.egg
\turbogears\database.py", line 303, in so_rwt
retval = func(*args, **kw)
File "<string>", line 5, in _expose
File "c:\python24\lib\site-packages\TurboGears-1.0.2.2-py2.4.egg
\turbogears\controllers.py", line 351, in <lambda>
mapping, fragment, args, kw)))
File "c:\python24\lib\site-packages\TurboGears-1.0.2.2-py2.4.egg
\turbogears\controllers.py", line 391, in _execute_func
return _process_output(output, template, format, content_type,
mapping, fragment)
File "c:\python24\lib\site-packages\TurboGears-1.0.2.2-py2.4.egg
\turbogears\controllers.py", line 82, in _process_output
fragment=fragment)
File "c:\python24\lib\site-packages\TurboGears-1.0.2.2-py2.4.egg
\turbogears\view\base.py", line 131, in render
return engine.render(**kw)
File "c:\python24\lib\site-packages\TurboKid-1.0.1-py2.4.egg\turbokid
\kidsupport.py", line 192, in render
return t.serialize(encoding=self.defaultencoding, output=format,
fragment=fragment)
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\__init__.py", line 299, in serialize
raise_template_error(module=self.__module__)
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\__init__.py", line 297, in serialize
return serializer.serialize(self, encoding, fragment, format)
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\serialization.py", line 105, in serialize
text = ''.join(self.generate(stream, encoding, fragment, format))
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\serialization.py", line 343, in generate
for ev, item in self.apply_filters(stream, format):
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\serialization.py", line 163, in format_stream
for ev, item in stream:
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\parser.py", line 219, in _coalesce
for ev, item in stream:
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\parser.py", line 177, in _track
for p in stream:
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\filter.py", line 24, in apply_matches
for ev, item in stream:
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\parser.py", line 177, in _track
for p in stream:
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\parser.py", line 227, in _coalesce
text += to_unicode(value, encoding)
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\parser.py", line 204, in to_unicode
return unicode(value, encoding)
type error: function takes exactly 5 arguments (1 given)

buffalob

unread,
Jun 26, 2007, 10:06:59 PM6/26/07
to TurboGears
Here's my code that causes the above:

from turbogears import controllers, expose, flash, redirect
from turbogears.feed import FeedController

test_string = "default"
input_feed_url_string = "none_yet"

import logging
log = logging.getLogger("hello.controllers")

import xml.sax.handler
import sgmllib

class ParseError(Exception):
pass

class HTML_Stripper(sgmllib.SGMLParser):
def __init__(self):
sgmllib.SGMLParser.__init__(self)

def strip(self, some_html):
self.theString = ""
self.feed(some_html)
self.close()
return self.theString

def handle_data(self, data):
self.theString += data

class Feed(FeedController):

def get_feed_data(self, **kwargs):
input_feed_url_string = "http://rss.news.yahoo.com/rss/health"
feed_title = "modified version"
html_stripper = HTML_Stripper()
entries = []
import feedparser
feed_data = feedparser.parse(input_feed_url_string)
for e in feed_data.entries:
item = {}
item["title"] = "some title"
from datetime import datetime
item["published"] = datetime.now()
item["updated"] = datetime.now()
item["author"] = "B"
safe_summary = e.summary_detail.value.encode('ascii',
'ignore')
modified_summary = safe_summary
log.error("safe_summary=" + safe_summary)
modified_summary = html_stripper.strip(safe_summary)
log.error("modified_summary=" + modified_summary)
item["summary"] = modified_summary
item["link"] = e.link
entries.append(item)
return dict( \
title = feed_title, link = "http://some_link.com", \
author = {"name": "B", "email": "test@some_link.com"}, \
subtitle = "info", id = "http://id_link.com", entries =
entries)

class Root(controllers.RootController):
feed = Feed()

def __init__(self):
controllers.RootController.__init__(self)

@expose(template="hello.templates.welcome")
def index(self):
import time
log.debug("TurboGears Controller Responding For Duty")
flash("index called... Your application is now running")
return dict(now=time.ctime())

@expose(template="hello.templates.hello")
def hello(self, *args, **kwargs):
return dict(greeting="Greetings again from the Controller")

buffalob

unread,
Jun 26, 2007, 10:11:26 PM6/26/07
to TurboGears
One more quick note about the "intermittently" aspect:
It seems likely that the intermittence is due to varying data content
in this RSS feed which my code reads via "feedparser" Python module
(which is Mark Pilgrim's well-known open source RSS reading software
Universal Feed Parser):

http://rss.news.yahoo.com/rss/health

Sometimes perhaps some data in the above feed results in the "5 args
vs. 1 arg" think happening in a behind the scenes unicode processing
step of TG.

buffalob

unread,
Jul 4, 2007, 10:42:18 PM7/4/07
to TurboGears
The "type error: function takes exactly 5 arguments (1 given)" crash I
was getting a couple weeks ago has returned today, and I think I've
narrowed down the data that is causing it to happen.

As before, the crash occurs in some internal TG functions for KID
processing (none of which I have ever changed at all) toward the
latter part of the call to my "get_feed_data" function in my
FeedControllers module, ending with the following lines (full set of
lines posted previously in this thread so I won't repeat them here):
...


\parser.py", line 227, in _coalesce
text += to_unicode(value, encoding)
File "c:\python24\lib\site-packages\kid-0.9.5-py2.4.egg\kid
\parser.py", line 204, in to_unicode
return unicode(value, encoding)
type error: function takes exactly 5 arguments (1 given)

Processing the following snippet of XML seems to cause the crash.
It's a portion of an RSS news feed, and when I put some debug
conditional processing to skip over this snippet then the crash does
not happen.

I am guessing that the offending character is the "&amp;#151"
following the word "education".

<item>
<title>Review finds nutrition education failing (AP)</title>
<link>http://us.rd.yahoo.com/dailynews/rss/health/*http://
news.yahoo.com/s/ap/20070704/ap_on_he_me/failing_to_fight_fat</link>
<guid isPermaLink="false">ap/20070704/failing_to_fight_fat</guid>
<pubDate>Wed, 04 Jul 2007 21:06:50 GMT</pubDate>
<description>AP - The federal government will spend more than &#36;1
billion this year on nutrition education &amp;#151; fresh carrot and
celery snacks, videos of dancing fruit, hundreds of hours of lively
lessons about how great you will feel if you eat well.</description>
</item>

I notice that in a browser the "&amp;#151" character is displayed as a
long dash.

Can anyone please offer me any suggestions for work-arounds I can add
to my "get feed_data" function so that when some external RSS feed I
process happens to have a character like this it can recover and
proceed without crashing?

Thanks much in advance for any help. This problem is beyond the outer
edge of my Python expertise, but I hope the solution can help me
advance that a bit and also make my project perform much more
reliably.

Researching this a little I found this discussion of a range of
character codes 128-159, of which the character 151 is within, so
maybe that has something to do with this?
http://www.cs.tut.fi/~jkorpela/chars.html#win
"In the Windows character set, some positions in the range 128 - 159
are assigned to printable characters, such as "smart quotes", em dash,
en dash, and trademark symbol. Thus, the character repertoire is
larger than ISO Latin 1. The use of octets in the range 128 - 159 in
any data to be processed by a program that expects ISO 8859-1 encoded
data is an error which might cause just anything. They might for
example get ignored, or be processed in a manner which looks
meaningful, or be interpreted as control characters. See my document
On the use of some MS Windows characters in HTML for a discussion of
the problems of using these characters."

Reply all
Reply to author
Forward
0 new messages