Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Can't print Chinese to HTTP

1,033 views
Skip to first unread message

Gnarlodious

unread,
Nov 30, 2009, 7:36:17 AM11/30/09
to pytho...@python.org
Hello.
The "upgrade to Python 3.1 has been disaster so far. I can't figure out how to print Chinese to a browser. If my script is:

#!/usr/bin/python
print("Content-type:text/html\n\n")
print('晉')

the Chinese string simply does not print. It works in interactive Terminal no problem, and also works in Python 2.6 (which my server is still running) in 4 different browsers. What am I doing wrong? BTW searched Google for 2 days no solution, if this doesn't get solved soon I will have to roll back to 2.6.

Thanks for any clue.

-- Gnarlie
http://Gnarlodious.com


"Martin v. Löwis"

unread,
Nov 30, 2009, 7:53:13 AM11/30/09
to

In the CGI case, Python cannot figure out what encoding to use for
output, so it raises an exception. This exception should show up in
the error log of your web server, please check.

One way of working around this problem is to encode the output
explicitly:

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")
sys.stdout.buffer.write('晉\n'.encode("utf-8"))

FWIW, the Content-type in your example is wrong in two ways:
what you produce is not HTML, and the charset parameter is
missing.

Regards,
Martin

Gnarlodious

unread,
Nov 30, 2009, 12:05:16 PM11/30/09
to
Thanks for the help, but it doesn't work. All I get is an error like:

UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
position 0: ordinal not in range(128)

It does work in Terminal interactively, after I import the sys module.
But my script doesn't act the same. Here is my entire script:

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")

import sys


sys.stdout.buffer.write('晉\n'.encode("utf-8"))

All I get is the despised "Internal Server Error" with Console
reporting:

malformed header from script. Bad header=\xe6\x99\x89

Strangely, if I run the script in Terminal it acts as expected.

This is OSX 10.6 2,, Python 3.1.1.
And it is frustrating because my entire website is hung up on this one
line I have been working on for 5 days.

-- Gnarlie
http://Gnarlodious.com

Aahz

unread,
Nov 30, 2009, 1:10:25 PM11/30/09
to
In article <e60da505-ac24-4307...@1g2000vbm.googlegroups.com>,

Gnarlodious <gnarl...@gmail.com> wrote:
>
>Thanks for the help, but it doesn't work. All I get is an error like:
>
>UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
>position 0: ordinal not in range(128)

No time to give you more info, but you probably need to change the
encoding of sys.stdout.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

The best way to get information on Usenet is not to ask a question, but
to post the wrong information.

Lie Ryan

unread,
Nov 30, 2009, 1:32:27 PM11/30/09
to
On 12/1/2009 4:05 AM, Gnarlodious wrote:
> Thanks for the help, but it doesn't work. All I get is an error like:
>
> UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
> position 0: ordinal not in range(128)

The error says it all; you're trying to encode the chinese character
using 'ascii' codec.

> malformed header from script. Bad header=\xe6\x99\x89

Hmmm... strange. The \xe6\x99\x89 happens to coincide with UTF-8
representation of 晉. Why is your content becoming a header?

> #!/usr/bin/python
do you know what python version, exactly, that gets called by this
hashbang? You mentioned that you're using python 3, but I'm not sure
that this hashbang will invoke python3 (unless Mac OSX has made a
progress above other linux distros and made python 3 the default python).

> Strangely, if I run the script in Terminal it acts as expected.

I think I see it now. You're invoking python3 in the terminal; but your
server invokes python 2. Python 2 uses byte-based string literal, while
python 3 uses unicode-based string literal. When you try to '
晉\n'.encode("utf-8"), python 2 tried to decode the string using 'ascii'
decoder, causing the exception.

Ned Deily

unread,
Nov 30, 2009, 2:00:12 PM11/30/09
to pytho...@python.org

> It does work in Terminal interactively, after I import the sys module.
> But my script doesn't act the same. Here is my entire script:
>
> #!/usr/bin/python
> print("Content-type:text/plain;charset=utf-8\n\n")
> import sys

> sys.stdout.buffer.write('���n'.encode("utf-8"))


>
> All I get is the despised "Internal Server Error" with Console
> reporting:
>
> malformed header from script. Bad header=�xe6�x99�x89
>
> Strangely, if I run the script in Terminal it acts as expected.
>
> This is OSX 10.6 2,, Python 3.1.1.

Are you sure you are actually using Python 3? /usr/bin/python is the
path to the Apple-supplied python 2.6.1. If you installed Python 3.1.1
using the python.org OS X installer, the path should be
/usr/local/bin/python3

--
Ned Deily,
n...@acm.org

exa...@twistedmatrix.com

unread,
Nov 30, 2009, 2:32:31 PM11/30/09
to Gnarlodious, pytho...@python.org
On 05:05 pm, gnarl...@gmail.com wrote:
>Thanks for the help, but it doesn't work. All I get is an error like:
>
>UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
>position 0: ordinal not in range(128)
>
>It does work in Terminal interactively, after I import the sys module.
>But my script doesn't act the same. Here is my entire script:
>
>#!/usr/bin/python
>print("Content-type:text/plain;charset=utf-8\n\n")
>import sys
>sys.stdout.buffer.write('f49\n'.encode("utf-8"))

>
>All I get is the despised "Internal Server Error" with Console
>reporting:
>
>malformed header from script. Bad header=\xe6\x99\x89

As the error suggests, you're writing f49 to the headers section of the
response. This is because you're not ending the headers section with a
blank line. Lines in HTTP end with \r\n, not with just \n.

Have you considered using something with fewer sharp corners than CGI?
You might find it more productive.

Jean-Paul

Gnarlodious

unread,
Nov 30, 2009, 10:24:54 PM11/30/09
to
> you probably need to change the encoding of sys.stdout
>>> sys.stdout.encoding
'UTF-8'

>> #!/usr/bin/python

> do you know what python version, exactly, that gets called by this
hashbang?

Verified in HTTP:
>>> print(sys.version)
3.1.1
Is is possible modules are getting loaded from my old Python?

I symlinked to the new Python, and no I do not want to roll it back
because it is work (meaning I would have to type "sudo").
ls /usr/bin/python
lrwxr-xr-x 1 root wheel 63 Nov 20 21:24 /usr/bin/python -> /Library/
Frameworks/Python.framework/Versions/3.1/bin/python3.1
Ugh, I have not been able to program in 11 days.

Now I remember doing it that way because I could not figure out how to
get Apache to find the new Python.

ls /usr/local/bin/python3.1
lrwxr-xr-x 1 root wheel 71 Nov 20 08:19 /usr/local/bin/python3.1 -
> ../../../Library/Frameworks/Python.framework/Versions/3.1/bin/
python3.1

So they are both pointing to the same Python.


And yes, I would prefer easier http scripting, but don't know one.

-- Gnarlie

Gnarlodious

unread,
Dec 1, 2009, 8:27:22 AM12/1/09
to
On Nov 30, 5:53 am, "Martin v. Löwis" wrote:

> #!/usr/bin/python
> print("Content-type:text/plain;charset=utf-8\n\n")
> sys.stdout.buffer.write('晉\n'.encode("utf-8"))

Does this work for anyone? Because all I get is a blank page. Nothing.
If I can establish what SHOULD work, maybe I can diagnose this
problem.

-- Gnarlie

Lie Ryan

unread,
Dec 1, 2009, 10:36:09 AM12/1/09
to

with a minor fix (import sys) that runs without errors in Python 3.1
(Vista), but the result is a bit disturbing...

--------------------------

Content-type:text/plain;charset=utf-8
<BLANKLINE>
<BLANKLINE>
--------------------------

(is this a bug? or just undefined behavior?)

the following works correctly in python 3.1:

---------------------------
#!/usr/bin/python
import sys
print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))


print("Content-type:text/plain;charset=utf-8\n\n")

print('晉\n')
----------------------------

(and that code will definitely fail with python2 because of the print
assignment, an insurance if your server happens to be misconfigured to
run python2)

Gnarlodious

unread,
Dec 1, 2009, 10:51:19 AM12/1/09
to
On Dec 1, 8:36 am, Lie Ryan wrote:

> #!/usr/bin/python
> import sys
> print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))
> print("Content-type:text/plain;charset=utf-8\n\n")
> print('晉\n')

HA! IT WORKS! Thank you thank you thank you. I don't understand the
lambda functionality but will figure it out. BTW this is OSX 10.6 and
Python 3.1.1.

Again, thank you for the help.

-- Gnarlie

Ned Deily

unread,
Dec 1, 2009, 2:07:53 PM12/1/09
to pytho...@python.org
In article
<bc8d4390-9f36-47c6...@m26g2000yqb.googlegroups.com>,
Gnarlodious <gnarl...@gmail.com> wrote:

> I symlinked to the new Python, and no I do not want to roll it back
> because it is work (meaning I would have to type "sudo").
> ls /usr/bin/python
> lrwxr-xr-x 1 root wheel 63 Nov 20 21:24 /usr/bin/python -> /Library/
> Frameworks/Python.framework/Versions/3.1/bin/python3.1
> Ugh, I have not been able to program in 11 days.

You should *not* do this. The files in /usr/bin are installed and
controlled by Apple and, in particular, /usr/bin/python is the Apple
supplied python. By changing /usr/bin/python, you are risking incorrect
operation of other system programs that may depend on it plus it is
quite likely that an OS X software update will overwrite this location
breaking your applications. Use /usr/local/bin/python3.1 instead.

--
Ned Deily,
n...@acm.org

Terry Reedy

unread,
Dec 1, 2009, 5:06:18 PM12/1/09
to pytho...@python.org
Gnarlodious wrote:
> On Dec 1, 8:36 am, Lie Ryan wrote:
>
>> #!/usr/bin/python
>> import sys
>> print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))

This is almost exactly the same as

def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

except that the latter gives better error tracebacks.

>> print("Content-type:text/plain;charset=utf-8\n\n")
>> print('晉\n')
>
> HA! IT WORKS! Thank you thank you thank you. I don't understand the
> lambda functionality but will figure it out.

Nothing to do with lambda, really. See above.

tjr

Message has been deleted

Gnarlodious

unread,
Dec 4, 2009, 10:41:06 PM12/4/09
to
On Dec 2, 11:58 pm, Dennis Lee Bieber wrote:

>         Have you tried
>
>         sys.stdout.write("Content-type:text/plain;charset=utf-8\r\n\r\n")

Yes I tried that when it was suggested, to no avail. All I get is
"Internal server error". All I can imagine is that there is no
"sys.stdout.write" in my Python. No idea why.

-- Gnarlie K5ZN

Gnarlodious

unread,
Dec 4, 2009, 10:57:54 PM12/4/09
to
On Dec 1, 3:06 pm, Terry Reedy wrote:
> def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

Here is a better solution that lets me send any string to the
function:

def print(html): return sys.stdout.buffer.write(("Content-type:text/
plain;charset=utf-8\n\n"+html).encode('utf-8'))

Why this changed in Python 3 I do not know, nor why it was nowhere to
be found on the internet.

Can anyone explain it?

Anyway, I hope others with this problem can find this solution.

-- Gnarlie

Lie Ryan

unread,
Dec 5, 2009, 5:54:51 AM12/5/09
to
On 12/5/2009 2:57 PM, Gnarlodious wrote:
> On Dec 1, 3:06 pm, Terry Reedy wrote:
>> def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))
>
> Here is a better solution that lets me send any string to the
> function:
>
> def print(html): return sys.stdout.buffer.write(("Content-type:text/
> plain;charset=utf-8\n\n"+html).encode('utf-8'))

No, that's wrong. You're serving HTML with Content-type:text/plain, it
should've been text/html or application/xhtml+xml (though technically
correct some older browsers have problems with the latter).

> Why this changed in Python 3 I do not know, nor why it was nowhere to
> be found on the internet.
>
> Can anyone explain it?

Python 3's str() is what was Python 2's unicode().
Python 2's str() turned into Python 3's bytes().

Python 3's print() now takes a unicode string, which is the regular string.

Because of the switch to unicode str, a simple print('晉') should've
worked flawlessly if your terminal can accept the character, but the
problem is your terminal does not.

The correct fix is to fix your terminal's encoding.

In Windows, due to the prompt's poor support for Unicode, the only real
solution is to switch to a better terminal.

Another workaround is to use a real file:

import sys
f = open('afile.html', 'w', encoding='utf-8')
print("晉", file=f)
sys.stdout = f
print("晉")

or slightly better is to rewrap the buffer with io.TextIOWrapper:
import sys, io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
print("晉")

Alf P. Steinbach

unread,
Dec 5, 2009, 6:30:31 AM12/5/09
to
* Lie Ryan:

> On 12/5/2009 2:57 PM, Gnarlodious wrote:
>> On Dec 1, 3:06 pm, Terry Reedy wrote:
>>> def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))
>>
>> Here is a better solution that lets me send any string to the
>> function:
>>
>> def print(html): return sys.stdout.buffer.write(("Content-type:text/
>> plain;charset=utf-8\n\n"+html).encode('utf-8'))
>
> No, that's wrong. You're serving HTML with Content-type:text/plain, it
> should've been text/html or application/xhtml+xml (though technically
> correct some older browsers have problems with the latter).
>
>> Why this changed in Python 3 I do not know, nor why it was nowhere to
>> be found on the internet.
>>
>> Can anyone explain it?
>
> Python 3's str() is what was Python 2's unicode().
> Python 2's str() turned into Python 3's bytes().
>
> Python 3's print() now takes a unicode string, which is the regular string.
>
> Because of the switch to unicode str, a simple print('晉') should've
> worked flawlessly if your terminal can accept the character, but the
> problem is your terminal does not.
>
> The correct fix is to fix your terminal's encoding.
>
> In Windows, due to the prompt's poor support for Unicode, the only real
> solution is to switch to a better terminal.

A bit off-topic perhaps, but that last is a misconception. Windows' [cmd.exe]
does have poor support for UTF-8, in short it Does Not Work in Windows XP, and
probably does not work in Vista or Windows7 either. However, Windows console
windows have full support for the Basic Multilingual Plane of Unicode: they're
pure Unicode beasts.

Thus, the problem is an interaction between two systems that Do Not Work: the
[cmd.exe] program's practically non-existing support for UTF-8 (codepage 65001),
and the very unfortunate confusion of stream i/o and interactive i/o in *nix,
which has ended up as a "feature" (it's more like a design bug) in a lot of
programming languages stemming from *nix origins, and that includes Python.

Windows' "terminal", its console window support, is INNOCENT... :-)

In Windows, as opposed to *nix, interactive character i/o is separated at the
API level. There is integration with stream i/o, but the interactive i/o can be
accessed separately. This is the "console function" API.

So for interactive console i/o one solution could be some Python module for
interactive console i/o, on Windows internally using the Windows console
function API, which is fully Unicode (based on UCS-2, i.e. the BMP).

Cheers,

- Alf

Gnarlodious

unread,
Dec 5, 2009, 8:56:32 PM12/5/09
to
On Dec 5, 3:54 am, Lie Ryan wrote:

> Because of the switch to unicode str, a simple print('晉') should've
> worked flawlessly if your terminal can accept the character, but the
> problem is your terminal does not.

There is nothing wrong with Terminal, Mac OSX supports Unicode from
one end to the other.
The problem is that your code works normally in Terminal but not in a
browser.

#!/usr/bin/python
import sys, io


print("Content-type:text/plain;charset=utf-8\n\n")

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
print("晉")

The browser shows "Server error", Apache 2 reports error:

[error] [client 127.0.0.1] malformed header from script. Bad header=
\xe6\x99\x89: test.py

So far every way to print Unicode to a browser looks very un-Pythonic.
I am just wondering if I have a bug or am missing the right way
entirely.

-- Gnarlie

Lie Ryan

unread,
Dec 6, 2009, 2:15:58 AM12/6/09
to
On 12/6/2009 12:56 PM, Gnarlodious wrote:
> On Dec 5, 3:54 am, Lie Ryan wrote:
>
>> Because of the switch to unicode str, a simple print('晉') should've
>> worked flawlessly if your terminal can accept the character, but the
>> problem is your terminal does not.
>
> There is nothing wrong with Terminal, Mac OSX supports Unicode from
> one end to the other.
> The problem is that your code works normally in Terminal but not in a
> browser.
>
> #!/usr/bin/python
> import sys, io
> print("Content-type:text/plain;charset=utf-8\n\n")
> sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
> print("晉")
>
> The browser shows "Server error", Apache 2 reports error:
>
> [error] [client 127.0.0.1] malformed header from script. Bad header=
> \xe6\x99\x89: test.py

I've already posted before for some reason it is not possible to mix
writing using print() and sys.stdout.buffer. On my machine, the output
got mixed up:

--------------------------

Content-type:text/plain;charset=utf-8
<BLANKLINE>
<BLANKLINE>
--------------------------

notice that the chinese character is on top of the header. I guess this
is due to the buffering from print.

> So far every way to print Unicode to a browser looks very un-Pythonic.
> I am just wondering if I have a bug or am missing the right way
> entirely.

My *guess* is Apache does not request a utf-8 stdout. When run on the
Terminal, the Terminal requested utf-8 stdout from python and the script
runs correctly. I'm not too familiar with Apache's internal nor how
python 3 figured its stdout's encoding, you might want to find Apache's
mailing list if they have any similar case.

PS: You might also want to look at this:
http://stackoverflow.com/questions/984014/python-3-is-using-sys-stdout-buffer-write-good-style

it says to try setting your PYTHONIOENCODING environment variable to "utf8"

Dave Angel

unread,
Dec 6, 2009, 5:22:48 AM12/6/09
to Gnarlodious, pytho...@python.org
Gnarlodious wrote:
> On Dec 5, 3:54 am, Lie Ryan wrote:
>
>
>> Because of the switch to unicode str, a simple print('晉') should've
>> worked flawlessly if your terminal can accept the character, but the
>> problem is your terminal does not.
>>
>
> There is nothing wrong with Terminal, Mac OSX supports Unicode from
> one end to the other.
> The problem is that your code works normally in Terminal but not in a
> browser.
>
> #!/usr/bin/python
> import sys, io
> print("Content-type:text/plain;charset=f-8\n\n")
> sys.stdout =o.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")

> print("晉")
>
> The browser shows "Server error", Apache 2 reports error:
>
> [error] [client 127.0.0.1] malformed header from script. Bad header\xe6\x99\x89: test.py

>
> So far every way to print Unicode to a browser looks very un-Pythonic.
> I am just wondering if I have a bug or am missing the right way
> entirely.
>
> -- Gnarlie
>
>
You change the meaning of sys.stdout without flushing the previous
instance. So of course buffering can mess you up. If you want to
change the encoding, do it at the beginning of the script.

#!/usr/bin/python
import sys, io


sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")

print("Content-type:text/plain;charset=f-8\n\n")
print("晉")


(You probably could use sys.stdout.flush() before reassigning, but doing
it at the beginning is better for several reasons.)

DaveA

0 new messages