First off, thanks for the brilliant engine that Brevé is!
I believe I've found a bug. Doing something like the following
produces an error:
import sys
from breve.tags.html import tags as T
sys.stdout.write(unicode(T.html [
T.head [
T.title [
'Hello'
],
],
T.body [
T.h1 [
'Title'
],
u'Some \u20ac text'
]
]).encode("utf-8"))
This wil produce the error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in
position 66: ordinal not in range(128)
As far as I can tell, this is because the file "breve/tags/
__init__.py" defines a __str__ method for the "Tag" class, which
converts the object to a unicode string. HOWEVER, that does not work
as one might expect, because a call to str() or unicode() does NOT
simply return whatever the __str__ method returns. It converts it to a
str string first (using the default encoding, which is ascii in my
case).
The "breve/tags/__init__.py" file should define a __unicode__ method
for the "Tag" class as well, which should do exactly the same as the
__str__ method. But now, when one calls the unicode() function for a
Tag object, it will use the __unicode__ method instead, which DOES
return unicode string instead of a str string.
I hope this helps!
Sven
Regards,
Cliff
== Olivier
On Sep 27, 10:16 pm, Cliff Wells <cl...@twisty-industries.com> wrote:
> I'm working on improving Unicode support but frankly it's not an area I
> understand fully (or use enough to catch problems). I'll try your
> suggestions. Thanks for the feedback!
>
> Regards,
> Cliff
>
> On Wed, 2007-09-26 at 12:51 +0000, cbrain wrote:
> > Hello,
>
> > First off, thanks for the brilliant engine that Brev? is!
What's not clear to me is what happens (or rather, should happen) when
unicode strings with a different encoding are embedded in a template
(whether directly or via a variable/function). Is it safe to assume
utf-8? What if the default encoding is something else?
Regards,
Cliff
Besides, if you only allow utf-8 you can give users a helpful error
message when the unicode function fails, because in most of the cases
a document encoded in non-utf-8 cannot be read as utf-8. This is not
the case for 8-bits encodings for example, so if you add support for
non utf-8 encodings, be prepared for lots of questions by confused
users seeing strange signs appearing instead of the expected
diacritics! :-)
regards,
== Olivier
This will make sure that all your template strings will be treated as
unicode (even without the u"").
I'm not sure whether my solution is the best or the simplest but it
works for me.
cheers,
== Olivier
I personally would not recommend doing it this way, because it is much
cleaner to just inject real unicode strings (not str representations
of unicode strings) into the template. That means that if you have a
component that produces UTF-8 strings, you need to decode that UTF-8
string into a unicode object using its decode() method before handing
it to Brevé. For example:
result = something_that_produces_utf_8()
result = result.decode("utf-8")
I think that Brevé should treat any 'str' strings as ASCII strings, so
that it assumes no encoding at all, and if an encoding IS used by
mistake, that that mistake will be caught immediately.
Do you agree that using unicode everywhere would be the cleaner
option?
--
With kind regards,
Sven
On Sep 29, 6:33 pm, Olivier Verdier <Olivier.Verd...@gmail.com> wrote:
Cliff,
My view on this is that Brevé should accept unicode strings and str
strings that only contain ASCII characters. That way, Brevé does not
assume anything (which is very Pythonic: in case of ambiguity, Python
always raises an exception). This can be accomplished by adding a
flattener for the 'str' type that does:
return the_str_string.decode("us-ascii")
That way, an exception will be raised if str string are passed in with
any encoding except for ASCII (which I regard as a kind of null-
encoding).
The programmer is the only one who knows for sure in what encoding
components outside of Brevé produce their strings. So, let the
programmer make sure that the encoding is handled correctly by
demanding that the encoding be undone before handing the result to
Brevé.
Just my view,
Sven
On Sep 29, 5:02 am, Cliff Wells <cl...@twisty-industries.com> wrote:
Sven,
Out of curiosity, what version of Breve are you using? 1.1.6 or SVN?
The above code appears to work under 1.1.6, but not SVN so I'm assuming
the latter.
Cliff
I stand corrected. It appears to work under 1.1.7 (an unreleased
version that lies somewhere between 1.1.6 and SVN head).
Would you mind testing against 1.1.6? SVN is known to be broken in a
couple places (I'm probably going to revert it back to a previous state
if I can't track down the issues).
Regards,
Cliff
I seem to be using version 1.1.7 according to:
>>> breve.__version__
'1.1.7'
I installed it on my Red Hat box using easy_install, which grabbed it
from the PyPi module repository. I just installed 1.1.6 on my FreeBSD
machine using its ports tree.
For some reason, I can't seem to reproduce the problem, neither with
version 1.1.6 nor with 1.1.7, neither on Linux not on FreeBSD. Using
my code snippet in my first posting works in both versions. I don't
get it :-(
--
Regards,
Sven
I'm trying Brevé on a server now with utf-8 string, and it is so
annoying that it doesn't just work.
Please allow support for non ascii strings for all of us using more
than the 127 ascii characters (unicode allows you to use tens of
thousands of characters!!). Again, the ascii users *won't see any
difference*.
Thanks a lot!
== Olivier
What I actually have in mind is that the global encoding will define
this "assumption". That is, if you set encoding='us-ascii' then it will
work like Sven suggests, if it's set to 'utf-8' then it works as you
suggest (and 'utf-8' will be the default).
If anyone sees an issue with this, please speak up =)
Regards,
Cliff
Thanks a lot, Cliff, working with Brevé is really a treat. It's a
really clever way of doing templates. I don't think that i will touch
html code ever again. :-)
cheers,
== Olivier
cheers!
== Olivier
I hope that all those change will somehow be implemented in brevé in a
not so far future. ;-)
Thanks!
cheers,
== Olivier
Or rather:
def quote_attrs ( attrs, default_encoding = 'utf-8'):
...
v = unicode ( v, default_encoding )
I haven't been able to devote much time to working on these fixes of
late, but I'm hoping to get to them soon.
Cliff
Index: loaders.py
===================================================================
--- loaders.py (revision 267)
+++ loaders.py (working copy)
@@ -9,4 +9,4 @@
return uid, timestamp
def load ( self, uid ):
- return unicode ( file ( uid, 'U' ).read ( ) )
+ return unicode ( file ( uid, 'U' ).read ( ), 'utf-8' )
Index: template.py
===================================================================
--- template.py (revision 267)
+++ template.py (working copy)
@@ -155,7 +155,7 @@
try:
bytecode = _cache.compile ( filename, T.root, T.loaders
[ -1 ] )
- output = flatten ( eval ( bytecode, _g, { } ) ).encode
( T.encoding )
+ output = flatten ( eval ( bytecode, _g, { } ) )
T.xml_encoding = kw.get ( 'xml_encoding',
'''<?xml version="%s"
encoding="%s"?>''' % ( T.xml_version, T.encoding ) )
except:
Index: util.py
===================================================================
--- util.py (revision 267)
+++ util.py (working copy)
@@ -40,7 +40,7 @@
quoted = [ ]
for a, v in attrs.items ( ):
if v is None: continue
- v = str ( v )
+ v = unicode ( v, 'utf-8' )
v = '"' + v.replace ( "&", "&"
).replace ( ">", ">"
).replace ( "<", "<"
Index: plugin/django_adapter.py
===================================================================
--- plugin/django_adapter.py (revision 267)
+++ plugin/django_adapter.py (working copy)
@@ -9,7 +9,7 @@
BREVE_ROOT = settings.BREVE_ROOT
def flatten_string ( obj ):
- return unicode ( obj ).encode ( settings.DEFAULT_CHARSET )
+ return obj
class _loader ( object ):
def __init__ ( self, root, breve_opts = None ):
@@ -40,6 +40,7 @@
self.breve_opts = breve_opts
def render ( self, vars = None ):
+ import os # why??
if vars == None:
vars = { }
elif isinstance ( vars, Context ):