What is different of handling character code between pure Python and Django?

98 views
Skip to first unread message

Sugita Shinsuke

unread,
Apr 27, 2014, 2:13:48 AM4/27/14
to django...@googlegroups.com
Hi there

I’d like to run Java code via Django.

The Java code, javaprogram use like below.


java javaprogram [text] [file_name]


text is parameter. multi-byte character is also okey.
file_name is generate file name.

So, I run the stand-alone Python program like below could run fine.

java_file = ‘javaprogram’
file_name = ‘filename’
text = ‘あいうえお’ #Japanese character
java_file_path = ‘/path/to/‘
class_path = ‘-cp ' + java_file_path

cmd = “java {0} {1} {2} {3}".format(class_path, java_file, text, file_name)

import subprocess
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)


But, I run same program in Django.
It couldn’t work. However, if text is English, it works fine.

What is different of handling character code between pure Python and Django?
And, could you tell me how to resolve it?

Thank you.

François Schiettecatte

unread,
Apr 27, 2014, 10:50:55 AM4/27/14
to django...@googlegroups.com
You should check the encoding of stdout when running from django, I suspect that it is plain ascii rather than utf-8 which is what you are probably getting when running standalone. Check sys.getdefaultencoding().

Note that this has nothing to do with django, just the way stdin/stdout are set up depending on how your script is running.

Also see:

http://stackoverflow.com/questions/1473577/writing-unicode-strings-via-sys-stdout-in-python
http://stackoverflow.com/questions/15740236/stdout-encoding-in-python
http://stackoverflow.com/questions/492483/setting-the-correct-encoding-when-piping-stdout-in-python

François
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/cd081056-2a39-40f5-94c8-7b470bf827ea%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

Sugita Shinsuke

unread,
May 5, 2014, 7:12:52 AM5/5/14
to django...@googlegroups.com
Dear François Schiettecatte 

Hello.
Thank you for replying.

I checked  sys.getdefaultencoding() in Django and pure Python of the server.

Django returns 'ascii'.
And, Python returns 'ascii' too.


I checked the locale.getpreferredencoding().

Django returns 'ascii'.
But, Python returns ''UTF-8''.
It is different of them.

So, is it some possibility of the cause of my probrem?
If it is the cause. Can I change encode type of locale.getpreferredencoding?

WBR Shinsuke


2014年4月27日日曜日 23時50分55秒 UTC+9 François Schiettecatte:

Sugita Shinsuke

unread,
May 6, 2014, 2:28:13 AM5/6/14
to django...@googlegroups.com
Dear François

I checked Django's encoding again.

locale.getpreferredencoding() is
ANSI_X3.4-1968

sys.getdefaultencoding() is
ascii

2014年5月5日月曜日 20時12分52秒 UTC+9 Sugita Shinsuke:

杉田臣輔

unread,
May 6, 2014, 10:31:44 AM5/6/14
to Hannu Krosing, django...@googlegroups.com
Hello

Thank you for replying.
I tried to use the ".encode('utf8')" method before.

I also tried your suggestion. But error happened like below.

'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)




2014-05-06 17:10 GMT+09:00 Hannu Krosing <ha...@2ndquadrant.com>:
On 05/06/2014 08:28 AM, Sugita Shinsuke wrote:

...

If it is the cause. Can I change encode type of locale.getpreferredencoding?
...

>
> So, I run the stand-alone Python program like below could run fine.
> —
> java_file = ‘javaprogram’
> file_name = ‘filename’
> text = ‘あいうえお’ #Japanese character
> java_file_path = ‘/path/to/‘
> class_path = ‘-cp ' + java_file_path
>
> cmd = “java {0} {1} {2} {3}".format(class_path, java_file, text, file_name)
You can check if encoding the arguments manually works:

cmd = "java {0} {1} {2} {3}".format(*[s.encode('utf8') for s in (class_path, java_file, text, file_name)])


Cheers
-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Sithembewena Lloyd Dube

unread,
May 6, 2014, 11:20:03 AM5/6/14
to django...@googlegroups.com
That should be ".encode('utf-8') - it is separated by a dash.


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.

For more options, visit https://groups.google.com/d/optout.



--
Regards,
Sithu Lloyd Dube

Sugita Shinsuke

unread,
May 7, 2014, 11:40:18 AM5/7/14
to django...@googlegroups.com
Hello Lloyd Dube

I fix 'utf8' to 'utf-8'.
But, same error happned.

ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

"locale.getpreferredencoding() is ANSI_X3.4-1968"
is not the cause?

2014年5月7日水曜日 0時20分03秒 UTC+9 Lloyd Dube:

Tom Evans

unread,
May 7, 2014, 12:23:01 PM5/7/14
to django...@googlegroups.com
On Sun, Apr 27, 2014 at 7:13 AM, Sugita Shinsuke <shin...@gmail.com> wrote:
> Hi there
>
> I’d like to run Java code via Django.
>
> The Java code, javaprogram use like below.
>
> —
> java javaprogram [text] [file_name]
> —
>
> text is parameter. multi-byte character is also okey.
> file_name is generate file name.
>
> So, I run the stand-alone Python program like below could run fine.
> —
> java_file = ‘javaprogram’
> file_name = ‘filename’
> text = ‘あいうえお’ #Japanese character
> java_file_path = ‘/path/to/‘
> class_path = ‘-cp ' + java_file_path
>
> cmd = “java {0} {1} {2} {3}".format(class_path, java_file, text, file_name)

What version of Python? 2.x or 3.x?

Presumably your email client is converting plain single and double
quotes to smart quotes and backticks, and in your source code they are
plain single and double quotes - " and ', not “ and ‘.

Django does something explicitly to force management commands to be
ascii, see here:

https://docs.djangoproject.com/en/1.6/howto/custom-management-commands/#management-commands-and-locales

There was an old case to force Django to use UTF-8:
https://code.djangoproject.com/ticket/5877

It was closed wontfix on the basis that Django respects your locale,
which it no longer does by default. There are several workarounds
listed on the first link that will restore that behaviour. Within a
view code, the rules are different, it will activate either the
language that user requests (if supported and USE_i18N=True) or
LANGUAGE_CODE otherwise.

Cheers

Tom

Sugita Shinsuke

unread,
May 8, 2014, 12:05:55 AM5/8/14
to django...@googlegroups.com, teva...@googlemail.com
Hello Tom Evans


> plain single and double quotes - " and ', not “ and ‘. 
My e-mail client is Gmail web client.

>What version of Python? 2.x or 3.x? 
Python version is 2.7.5
And, Django version is 1.3.7

>There was an old case to force Django to use UTF-8: 
I added 
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
But, I couldn't resolve...

>language that user requests (if supported and USE_i18N=True) or 
LANGUAGE_CODE otherwise. 

Both of them are True.
USE_I18N = True
USE_L10N = True


2014年5月8日木曜日 1時23分01秒 UTC+9 Tom Evans:

Tom Evans

unread,
May 8, 2014, 4:45:52 AM5/8/14
to django...@googlegroups.com
On Thu, May 8, 2014 at 5:05 AM, Sugita Shinsuke <shin...@gmail.com> wrote:
> Hello Tom Evans
>
>
>> plain single and double quotes - " and ', not “ and ‘.
> My e-mail client is Gmail web client.
>
>>What version of Python? 2.x or 3.x?
> Python version is 2.7.5
> And, Django version is 1.3.7

Django 1.3.7 is very old, it has known security holes in it and is not
maintained. What I talk about from here on down is not relevant for
1.3 - upgrade to at least 1.6.

In Python 2, you should mark your strings as unicode if they contain
anything other than ascii.

u'This is a unicode string'

'This is a byte string'

The characters within the string should be in the encoding specified
for the current file. See:

https://docs.python.org/2/howto/unicode.html#unicode-literals-in-python-source-code

Unicode strings are converted to the correct encoding for your
environment when output (running a command "outputs" the string to the
shell).

So, if you want your management command to output UTF-8:

1) Mark the strings in your program as unicode
2) Mark the files containing unicode string literals to denote the
character encoding used by the string literals
3) Ensure the environment that django is run in has the locale
correctly specified.
4) Ensure your management command either:
a) Activates a fixed language prior to outputting unicode or
b) Instructs django to use locale from the environment
See https://docs.djangoproject.com/en/1.6/howto/custom-management-commands/#management-commands-and-locales

Cheers

Tom

Sugita Shinsuke

unread,
May 27, 2014, 12:02:13 AM5/27/14
to django...@googlegroups.com, teva...@googlemail.com
Hello Tom

I finally resolved the problem.

I used subprocess.Popen's option "shell=False"
And, make cm by list.

I resolved.

Thank you.


2014年5月8日木曜日 17時45分52秒 UTC+9 Tom Evans:
Reply all
Reply to author
Forward
0 new messages