2009/3/21 AchipA <attila...@gmail.com>:
>
> Oops, wrong copypaste, here is the correct one:
>
> In [1]: validator = IS_LENGTH(1)
>
> In [2]: validator('a')
> Out[2]: ('a', None)
>
> In [3]: validator('aa')
> Out[3]: ('aa', 'too long!')
>
> In [4]: validator('á')
> Out[4]: ('\xc3\xa1', 'too long!')
>
> In [5]: validator('á'.decode('utf-8'))
> Out[5]: (u'\xe1', None)
>
> In [6]: validator('ж'.decode('utf-8'))
> Out[6]: (u'\u0436', None)
>
> I say Alexei found a bug :)
>
[...]
--
Alexei Vinidiktov
The thing is the project that I'm intending to use web2py for is a web
application for language learners, and I need to be sure that as many
languages as possible are correctly treated by the application.
So, I don't think it would be safe to use a Russian character for
calculating the length of a field as in charlen = lambda n:
n*len('л').
Unfortunately, due to the nature of the web application I'm planning
on using web2py for, I can't use a single-byte encoding for the
database or most tables.
The tables are going to store strings in many different languages of the world.
I was hoping that web2py could transparently communicate with
databases that are UTF8 encoded and that I would be able to do
operations on strings retrieved from databases without thinking about
their encodings.
That is the goal. It will never be 100% as it is somewhat dabase/
>I was hoping that web2py could transparently communicate with
>databases that are UTF8 encoded and that I would be able to do
>operations on strings retrieved from databases without thinking about
>their encodings.
version dependant. As you yourself write, it uses utf-8 encoded
strings (which is the python 2.x norm and this won't change to unicode
objects at least until web2py support for Python 3.0 arrives) and uses
utf8 data in the database. That being told, a quick glance at the
IS_LENGTH validator shows that it might not be entirely correctly
using len(), I think Massimo should take a look at it.
>>> len('a')
1
>>> len('á')
2
>>> len(u'á')
1
>>> len('á'.decode('utf-8'))
1
Thanks for your input, Yarko. I've read the articles you mentioned and
I understand UTF8 better now. You are right about the 3 byte
assumption. It's a pretty safe bet for my purposes.
I hope the project I'm working on will be shaping up in the coming
months, and that I'll have enough news to share about the progress.
Anyway, as I'm only beginning to work with web2py, I'm going to have
quite a few questions to ask.
[...]
--
Alexei Vinidiktov
Oops, wrong copypaste, here is the correct one:
In [1]: validator = IS_LENGTH(1)
In [2]: validator('a')
Out[2]: ('a', None)
In [3]: validator('aa')
Out[3]: ('aa', 'too long!')
In [4]: validator('á')
Out[4]: ('\xc3\xa1', 'too long!')
In [5]: validator('á'.decode('utf-8'))
Out[5]: (u'\xe1', None)
In [6]: validator('ж'.decode('utf-8'))
Out[6]: (u'\u0436', None)
I say Alexei found a bug :)