managing latin characters

94 views
Skip to first unread message

Andrea Fae'

unread,
Jun 18, 2018, 3:40:45 PM6/18/18
to web2py-users
How to manage latin character, like italian character like è,é, à, ò....
If I want to insert for example à it "translate" to “/xe0153"....
How to wotk with this type of charater...Do I need to change from utf-8 to? Thanks

Richard Vézina

unread,
Jun 18, 2018, 4:19:59 PM6/18/18
to web2py-users
It depends... If it coming from the form you have nothing to do... If you add them in your code as hardcoded value you have to encode them properly, you don't just do 'ton text en français' as it gonna fails...

You always better work in english in your code anyway as it makes not much sens to work with other language and use the magic T() sometimes you will sometimes you will not and that mean that your app will be automatically translate if you do use T() but not if you hardcoded thing in other then english language...

If you do use other software source of information that are coming with various encoding (they better support UTF-8) you will have to use variable_name.decode(utf-8) once when you get the stuff in and variable_name.encode(utf-8) when you push stuff in this other softwares... If you do such thing I recommand you to search for python and endocing and read the doc...

Richard

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Leonel Câmara

unread,
Jun 18, 2018, 5:50:05 PM6/18/18
to web2py-users
Web2py uses utf-8 everywhere by default, you don't need to worry about it. If you're smart you will use utf-8 in the database too and everything pretty much just works, otherwise you will just have to tell the DAL about the encoding the database uses using db_codec.

Andrea Fae'

unread,
Jun 19, 2018, 3:00:03 PM6/19/18
to web2py-users
Thank you Leonel, but when I try to insert to db for example this name "Donà" in the username field (table auth_user) the system tells me that is not possible...So, what to do? Convert "Donà" in "Dona" without accent? Which characters can I use in the username field? In the book there is only db_codec=latin...where can I find documentation about encoding ....in python documentation? thank you

Dave S

unread,
Jun 19, 2018, 4:40:08 PM6/19/18
to web2py-users

You probably want to look at chapter 7.8 of the Python Standard Library docs:
<URL:https://docs.python.org/2.7/library/codecs.html>
especially section 7.8.2.
Also, chapter 7.9 describes the tools for getting unicode properties, and you can find the numeric value for a given character that way.

I would expect the DAL connectors to handle most of the common encodings, and Leonel points out they can talk UTF-8 to a database that supports it (find me an example that doesn't).
In the book, the DAL signature shows UTF-8 as the default codec.
<URL:http://web2py.com/books/default/chapter/29/06/the-database-abstraction-layer?search=unicode#DAL-signature>


If you really need to be ASCII-safe: Since URLs are supposed to be ASCII, unicode gets escaped ... look at chapter 20.5.2 for urllib.quote():
<URL:https://docs.python.org/2.7/library/urllib.html#utility-functions>
or the web2py helper function xmlescape()
<URL:http://web2py.com/books/default/chapter/29/05/the-views#xmlescape>
to provide a "clean ascii" representation if you need to.

I know, TL;DR.  But I hope this gives you some pointers to your choices.

Dave S
/dps

Anthony

unread,
Jun 19, 2018, 5:56:58 PM6/19/18
to web2py-users
On Tuesday, June 19, 2018 at 3:00:03 PM UTC-4, Andrea Fae' wrote:
Thank you Leonel, but when I try to insert to db for example this name "Donà" in the username field (table auth_user) the system tells me that is not possible...So, what to do? Convert "Donà" in "Dona" without accent? Which characters can I use in the username field? In the book there is only db_codec=latin...where can I find documentation about encoding ....in python documentation? thank you

The default validators for the username field are:

[IS_MATCH('[\w\.\-]+', strict=True,
error_message=self.messages.invalid_username),
IS_NOT_IN_DB(db, '%s.username' % settings.table_user_name,
error_message=self.messages.username_taken)]

You could replace the IS_MATCH with a custom validator that accepts unicode characters, or just remove IS_MATCH altogether.

Anthony
Reply all
Reply to author
Forward
0 new messages