At least accept unicode for input. If you return utf-8 that's fine if it's
a documented behaviour.
A simple if isinstance(input_string, unicode): input_string.encode('utf-8')
should suffice.
The current behaviour is meant to be to accept str (and assume the encoding
is utf8) or unicode (which is internally converted to a utf8 string). All
strings are returned as utf8 encoded "str"s. If you find a place in the
API where this isn't true, please comment here.