forcing UTF8 data inside django

Showing 1-7 of 7 messages
forcing UTF8 data inside django Victor Ng 12/10/06 8:02 PM
Hi all,

The unicode problem seems to creep up in this list a lot, so here's
what I've done to solve my problems.

My particular problem is that I need to be able to deal with Unicode
data in the URLs as well as the regular request GET/POST data.

This is a piece of middleware that I'm using to force all incoming
data to be UTF-8.  If you also add in a meta tag in your head section
of your template to declare  utf-8, I think IE will actually do the
right thing and not do it's weird charset guessing.

Setting that meta tag, along with explicitly seting the
settings.DEFAULT_CHARSET to 'utf-8', and then applying this middleware
layer seems to get all my string data inside Django to be clean UTF8
data.

This has the advantage over a 'full' unicode conversion of Django
since you don't have to touch any existing Django code, and you don't
have to enforce non-obvious rules like implementing "__unicode__" and
using "unicode()" instead of "str()" everywhere.

Anyway, I hope this is of use to people.

The utf8encode function is probably overly paranoid, but well... I
really don't trust IE to send properly encoded data.

vic

      1 import types
      2
      3 '''
      4 This filter will force any incoming GET or POST data to become
UTF8 data for
      5 processing inside of Django.
      6 '''
      7
      8 class UTF8Filter(object):
      9     def process_request(self, request):
     10         get_parms = request.GET
     11         post_parms = request.POST
     12
     13         request.GET._mutable = True
     14         request.POST._mutable = True
     15
     16         for cgiargs in [request.GET, request.POST]:
     17             for key, vallist in cgiargs.items():
     18                 tmp_values = []
     19                 if isinstance(vallist, types.ListType):
     20                     for i, val in enumerate(vallist):
     21                         tmp_values.append(utf8_encode(val))
     22                 else:
     23                     tmp_values = [utf8_encode(vallist),]
     24
     25                 cgiargs.setlist(key, tmp_values)
     26
     27         # Rewrite the request path as UTF8 data for Ajax calls
     28         request.path = utf8_encode(request.path)
     29
     30         request.GET._mutable = False
     31         request.POST._mutable = False
     32
     33         return None
     34
     35 def utf8_encode(val):
     36     try:
     37         tmp = val.decode('utf8')
     38     except:
     39         try:
     40             tmp = val.decode('latin1')
     41         except:
     42             tmp= val.decode('ascii', 'ignore')
     43     tmp = tmp.encode('utf8')
     44     return tmp

Re: forcing UTF8 data inside django mezhaka 12/11/06 2:44 AM

What was your motivation to create all this?
The reason I am asking, I suppose my problem
(http://groups-beta.google.com/group/django-users/browse_thread/thread/a9b53db451aa4590)
is somehow related to these issues.

Re: forcing UTF8 data inside django Gábor Farkas 12/11/06 4:30 AM
Victor Ng wrote:
> Hi all,
>
> The unicode problem seems to creep up in this list a lot, so here's
> what I've done to solve my problems.
>
> My particular problem is that I need to be able to deal with Unicode
> data in the URLs as well as the regular request GET/POST data.
>
> This is a piece of middleware that I'm using to force all incoming
> data to be UTF-8.  If you also add in a meta tag in your head section
> of your template to declare  utf-8, I think IE will actually do the
> right thing and not do it's weird charset guessing.
>

hi,

well, from my experiences, the most important thing is the content-type
http header. if you explicitly tell there the charset, then the browser
will use that, and completely ignore the charset-specification in the
html file.

also, may i ask, why such a paranoid way of working with GET/POST?
because (also, only my experience, no big testing), the browsers submit
their form-data in the charset in which the page containing the form was.

so if you send to the browser an utf-8 page, it's submitted data is
going to be utf-8.


gabor

Re: forcing UTF8 data inside django Victor Ng 12/11/06 8:08 AM
Hi Gabor,

First off, I just realized that the code I posted earlier has a small bug.

Line 17 should've read:

    17             for key, vallist in cgiargs.lists():

the old code used 'items()' which only pulls a single value out of
multivaluedict.

On to unicode....

The reason I'm paranoid about handling GET/POST data is because MSIE
is retarded.

Here's two good references:

http://ln.hixie.ch/?start=1144794177&count=1
http://www.joelonsoftware.com/articles/Unicode.html

Basically, IE ignores the content-type header and figures out the
content type by doing content sniffing.

So sometimes, IE guesses wrong - and you get garbage if you just use
the Content-Type header.  If you use the meta tag, it forces UTF8 in
almost all browsers.

victor "MSIE is a four letter word" ng

On 12/11/06, Gábor Farkas <ga...@nekomancer.net> wrote:
> well, from my experiences, the most important thing is the content-type
> http header. if you explicitly tell there the charset, then the browser
> will use that, and completely ignore the charset-specification in the
> html file.
>
> also, may i ask, why such a paranoid way of working with GET/POST?
> because (also, only my experience, no big testing), the browsers submit
> their form-data in the charset in which the page containing the form was.
>
> so if you send to the browser an utf-8 page, it's submitted data is
> going to be utf-8.
>
>
> gabor
>
> >
>


--
"Never attribute to malice that which can be adequately explained by
stupidity."  - Hanlon's Razor

Re: forcing UTF8 data inside django Victor Ng 12/11/06 8:17 AM
Hi Anton,

I don't have mysql5 to test with right now, but I have tested my stuff
against sqlite and it seems to work there, so I can't imagine that
this will cause you problems on mysql.

My usecase is probably like yours - I need multilingual support since
I have to handle names of countries and people from all over the
world.

vic


--
"Never attribute to malice that which can be adequately explained by
stupidity."  - Hanlon's Razor

Re: forcing UTF8 data inside django favo 12/12/06 8:08 AM
I think you'd better enforce de/encoding to settings.DEFAULT_CHARSET in
the middleware. not hardcode utf8.

Re: forcing UTF8 data inside django Victor Ng 12/12/06 2:27 PM
Unfortunately, not all charsets will support all unicode characters,
so really, the fact that DEFAULT_CHARSET configurable is mostly a moot
point for me.  For example, latin1 won't let me encode asian
characters.

I honestly can't think of a good reason to do anything other than UTF8
unless you've got some weird requirement to do otherwise.

vic

On 12/12/06, favo <Favo...@gmail.com> wrote:
>
> I think you'd better enforce de/encoding to settings.DEFAULT_CHARSET in
> the middleware. not hardcode utf8.
>
>
> >
>


--
"Never attribute to malice that which can be adequately explained by
stupidity."  - Hanlon's Razor