The unicode problem seems to creep up in this list a lot, so here's
what I've done to solve my problems.
My particular problem is that I need to be able to deal with Unicode
data in the URLs as well as the regular request GET/POST data.
This is a piece of middleware that I'm using to force all incoming
data to be UTF-8. If you also add in a meta tag in your head section
of your template to declare utf-8, I think IE will actually do the
right thing and not do it's weird charset guessing.
Setting that meta tag, along with explicitly seting the
settings.DEFAULT_CHARSET to 'utf-8', and then applying this middleware
layer seems to get all my string data inside Django to be clean UTF8
data.
This has the advantage over a 'full' unicode conversion of Django
since you don't have to touch any existing Django code, and you don't
have to enforce non-obvious rules like implementing "__unicode__" and
using "unicode()" instead of "str()" everywhere.
Anyway, I hope this is of use to people.
The utf8encode function is probably overly paranoid, but well... I
really don't trust IE to send properly encoded data.
vic
1 import types
2
3 '''
4 This filter will force any incoming GET or POST data to become
UTF8 data for
5 processing inside of Django.
6 '''
7
8 class UTF8Filter(object):
9 def process_request(self, request):
10 get_parms = request.GET
11 post_parms = request.POST
12
13 request.GET._mutable = True
14 request.POST._mutable = True
15
16 for cgiargs in [request.GET, request.POST]:
17 for key, vallist in cgiargs.items():
18 tmp_values = []
19 if isinstance(vallist, types.ListType):
20 for i, val in enumerate(vallist):
21 tmp_values.append(utf8_encode(val))
22 else:
23 tmp_values = [utf8_encode(vallist),]
24
25 cgiargs.setlist(key, tmp_values)
26
27 # Rewrite the request path as UTF8 data for Ajax calls
28 request.path = utf8_encode(request.path)
29
30 request.GET._mutable = False
31 request.POST._mutable = False
32
33 return None
34
35 def utf8_encode(val):
36 try:
37 tmp = val.decode('utf8')
38 except:
39 try:
40 tmp = val.decode('latin1')
41 except:
42 tmp= val.decode('ascii', 'ignore')
43 tmp = tmp.encode('utf8')
44 return tmp
What was your motivation to create all this?
The reason I am asking, I suppose my problem
(http://groups-beta.google.com/group/django-users/browse_thread/thread/a9b53db451aa4590)
is somehow related to these issues.
hi,
well, from my experiences, the most important thing is the content-type
http header. if you explicitly tell there the charset, then the browser
will use that, and completely ignore the charset-specification in the
html file.
also, may i ask, why such a paranoid way of working with GET/POST?
because (also, only my experience, no big testing), the browsers submit
their form-data in the charset in which the page containing the form was.
so if you send to the browser an utf-8 page, it's submitted data is
going to be utf-8.
gabor
First off, I just realized that the code I posted earlier has a small bug.
Line 17 should've read:
17 for key, vallist in cgiargs.lists():
the old code used 'items()' which only pulls a single value out of
multivaluedict.
On to unicode....
The reason I'm paranoid about handling GET/POST data is because MSIE
is retarded.
Here's two good references:
http://ln.hixie.ch/?start=1144794177&count=1
http://www.joelonsoftware.com/articles/Unicode.html
Basically, IE ignores the content-type header and figures out the
content type by doing content sniffing.
So sometimes, IE guesses wrong - and you get garbage if you just use
the Content-Type header. If you use the meta tag, it forces UTF8 in
almost all browsers.
victor "MSIE is a four letter word" ng
On 12/11/06, Gábor Farkas <ga...@nekomancer.net> wrote:
> well, from my experiences, the most important thing is the content-type
> http header. if you explicitly tell there the charset, then the browser
> will use that, and completely ignore the charset-specification in the
> html file.
>
> also, may i ask, why such a paranoid way of working with GET/POST?
> because (also, only my experience, no big testing), the browsers submit
> their form-data in the charset in which the page containing the form was.
>
> so if you send to the browser an utf-8 page, it's submitted data is
> going to be utf-8.
>
>
> gabor
>
> >
>
--
"Never attribute to malice that which can be adequately explained by
stupidity." - Hanlon's Razor
I don't have mysql5 to test with right now, but I have tested my stuff
against sqlite and it seems to work there, so I can't imagine that
this will cause you problems on mysql.
My usecase is probably like yours - I need multilingual support since
I have to handle names of countries and people from all over the
world.
vic
I honestly can't think of a good reason to do anything other than UTF8
unless you've got some weird requirement to do otherwise.
vic
On 12/12/06, favo <Favo...@gmail.com> wrote:
>
> I think you'd better enforce de/encoding to settings.DEFAULT_CHARSET in
> the middleware. not hardcode utf8.
>
>
> >
>