Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
preferred way to set encoding for print
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
_wolf  
View profile  
 More options Sep 15 2009, 9:28 am
Newsgroups: comp.lang.python
From: _wolf <wolfgang.l...@gmail.com>
Date: Tue, 15 Sep 2009 06:28:06 -0700 (PDT)
Local: Tues, Sep 15 2009 9:28 am
Subject: preferred way to set encoding for print
hi folks,

i am doing my first steps in the wonderful world of python 3.

some things are good.
some things have to be relearned.
some things drive me crazy.

sadly, i'm working on a windows box. which, in germany, entails that
python thinks it to be a good idea to take cp1252 as the default
encoding.

so just coz i got my box in germany means i can never print out a
chinese character? say what?

i have no troubles with people configuring their python installation
to use any encoding in the world, but wouldn't it have been less of a
surprise to just assume utf-8 for any file in/output? after all, it is
already the default for python source files as far as i understand.
someone might think they're clever to sniff into the system and make
the somehwat educated guess that this dude's using cp1252 for his
files. but they would be wrong.

so: how can i tell python, in a configuration or using a setting in
sitecustomize.py, or similar, to use utf-8 as a default encoding?
there used to be a trick to say `reload(sys);sys.setdefaultencoding
('utf-8')`, but that has no effect in py3.0.1. also, i cannot set
`sys.stdout.encoding`; is there a way to re-open that stream with a
different encoding?

in all, i believe it is quite unsettling to me to see that, on my py3
installation,

sys.getdefaultencoding() == 'utf-8'
sys.stdout.encoding == 'cp1252'
locale.getlocale() == (None, None)
locale.getdefaultlocale() == ('de_DE', 'cp1252')

which to me makes as much sense as a blackcurrant tart thrown into
space. worse,

locale.setlocale( locale.LC_ALL, locale.getdefaultlocale() )

results in

locale.Error: unsupported locale setting

this bloody thing doesn't accept its *own* output. attempts to feed
that locale beast with anything but the empty string or 'C' were all
doomed. it would take a very patient and eloquent person to explain
that in a credible fashion to me. my word for this is, 'broken'.

i would very much like to rid myself of these considerations. just say
it's all utf-8, wash'n'go.

my attempts of changing python's mind using the locale module have
failed so far. otherwise, i for one don't want to touch that locale
thing with a very long pole. as far as i can see, it does not work as
documented. the platform dependencies are also a clear OFF LIMITS sign
to me.

any suggestions?

cheers,

~flow


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Tolonen  
View profile  
 More options Sep 16 2009, 1:16 am
Newsgroups: comp.lang.python
From: "Mark Tolonen" <metolone+gm...@gmail.com>
Date: Tue, 15 Sep 2009 22:16:07 -0700
Local: Wed, Sep 16 2009 1:16 am
Subject: Re: preferred way to set encoding for print

"_wolf" <wolfgang.l...@gmail.com> wrote in message

news:22991c72-d00f-45cd-9bf7-0b80fc4319bd@k26g2000vbp.googlegroups.com...

What specifically do you want to do?  I work with Chinese all the time on a
U.S. Windows system.  Do you want to print Chinese characters in a console
window?  In a Python IDE?  FYI, I don't use the locale module for much at
all.

I can't type or print Chinese to a console window unless I change Control
Panel, Regional and Language Options, Advanced Tab, Language for non-Unicode
Programs to a Chinese selection (and reboot).  Then the default
sys.stdout.encoding is something like cp936.

The Pythonwin IDE in the latest version of pywin32, however, supports UTF-8
in its interactive window and displays Chinese fine.

Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr
(See the Python help for details), but if your terminal doesn't support the
encoding that won't help.

Let me know what you're trying to do.

-Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "(A Possible Solution) Re: preferred way to set encoding for print" by ~flow
~flow  
View profile  
 More options Sep 16 2009, 3:39 pm
Newsgroups: comp.lang.python
From: "~flow" <wolfgang.l...@gmail.com>
Date: Wed, 16 Sep 2009 12:39:55 -0700 (PDT)
Local: Wed, Sep 16 2009 3:39 pm
Subject: (A Possible Solution) Re: preferred way to set encoding for print
On Sep 16, 7:16 am, "Mark Tolonen" <metolone+gm...@gmail.com> wrote:

> Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr
> (See the Python help for details), but if your terminal doesn't support the
> encoding that won't help.

thx for these two tips. of course, that was a bit misleading by me to
complain that a cp850 terminal can't display chinese characters from
python----it cannot do it all, of course.

i've gone on to experiment. what i do not want is python to stop
execution when an encoding error occurs on printing and perhaps
logging. so far, i used to do this by convincing python to use utf-8
in any and all cases, and then live with the amount garbish that
appears on screen when using cp850 and cp1252 terminals.

what has changed in python is that they now somehow find out about the
terminal's encoding, and then put that encoding into place and defend
it with teeth and claws. it is simply not easy to take control of that
setting.

this is in itself unfortunate; i believe that users should have a
right to determine what to do in case of stdout encoding problems.
these are a little different from i-wrote-to-that-file-and-boom
experiences. *there* the encoding exception is fully warranted, and
could be easily fixed by allowing a less-than-strict encoding mode.

but print is different, and of all situations where encoding errors
can occur, this is the hardest to take hold of. and much more so in
python3 it seems than in python2.

printing to the screen is often purely meta-informative in nature, a
side-effect e.g. of a webserver really doing web pages. i don't want
to bring my entire system down just because some output into some
terminal in the back orifice produced a some amount of grabish. maybe
only a single chinese character amongst thousands of done this done
that red tape.

i think web browsers are a good example here. i don't know whether it
was a good idea to let clients reassemble broken web pages in an order
as they see fit, but the policy to just output broken encoding
character instances instead of terminating the browser process with a
lengthy stacktrace was probably somehow good for the poopularity of
the web as we know it.

my current patch looks like this:

  class Stdout_writer_with ncrs( object ):

    def write( self, p ):
      """See to it that all write encodings are done using numerical
character references (NCRs) that
      circumvents Python’s default behavior of raising an exception
whenever it encounters an
      unrepresentable character while printing."""
      enc   = sys.__stdout__.encoding
      p     = p if isinstance( p, str ) else str( p )
      p     = p.encode( enc, 'xmlcharrefreplace' ).decode( enc )
      sys.__stdout__.write( p )

  sys.stdout = Stdout_writer_with ncrs()

this method picks up anything to be printed, makes sure it is a text,
and then encodes it to the terminal encoding using numerical character
references (NCRs), then decodes it again since the underlying wrapper
class wants to do encodings itself and refuses bytes in place of
strings to be sent (again, this is not nice: an array of byte values
sent to the print method is a clear request to send exactly those
bytes, verbatim, one by one, to the terminal. no mucking around with
my bytes, pls! maybe i can implement that in the code above, too.)

of course, this simplistic scaffold will break if anyone uses
sys.stdout for anything but issue sys.stdout.write(), but so far it
has worked fine despite of being a defective, tiny shim. maybe
inheriting from sys.stdout.__class__ would help.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "preferred way to set encoding for print" by Terry Reedy
Terry Reedy  
View profile  
 More options Sep 16 2009, 6:04 pm
Newsgroups: comp.lang.python
From: Terry Reedy <tjre...@udel.edu>
Date: Wed, 16 Sep 2009 18:04:06 -0400
Local: Wed, Sep 16 2009 6:04 pm
Subject: Re: preferred way to set encoding for print

Mark Tolonen wrote:
>> ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set

Even if not relevant to your immediate problem, if you can, upgrade to
3.1, with its many important bug fixes.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "(A Possible Solution) Re: preferred way to set encoding for print" by Mark Tolonen
Mark Tolonen  
View profile  
 More options Sep 17 2009, 12:50 am
Newsgroups: comp.lang.python
From: "Mark Tolonen" <metolone+gm...@gmail.com>
Date: Wed, 16 Sep 2009 21:50:03 -0700
Local: Thurs, Sep 17 2009 12:50 am
Subject: Re: (A Possible Solution) Re: preferred way to set encoding for print

"~flow" <wolfgang.l...@gmail.com> wrote in message

news:643ca91c-b81c-483c-a8af-65c93b593e1f@r33g2000vbp.googlegroups.com...

>On Sep 16, 7:16 am, "Mark Tolonen" <metolone+gm...@gmail.com> wrote:
>> Setting PYTHONIOENCODING overrides the encoding used for
>> stdin/stdout/stderr
>> (See the Python help for details), but if your terminal doesn't support
>> the
>> encoding that won't help.
[snip]
>what has changed in python is that they now somehow find out about the
>terminal's encoding, and then put that encoding into place and defend
>it with teeth and claws. it is simply not easy to take control of that
>setting.

A couple more tips, PYTHONIOENCODING takes an optional errorhandler:

C:\>set PYTHONIOENCODING=cp437:xmlcharrefreplace
C:\>python
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)]
on
win32
Type "help", "copyright", "credits" or "license" for more information.

>>> print('Hello \u5000\u5001')

Hello &#20480;&#20481;

You can also write directly to stdout with byte strings (Note: my terminal
doesn't support UTF-8, but no error):

>>> import sys
>>> sys.stdout.buffer.write('\u5000'.encode('utf8'))

s��3

-Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Miles Kaufmann  
View profile  
 More options Sep 17 2009, 1:19 am
Newsgroups: comp.lang.python
From: Miles Kaufmann <mile...@umich.edu>
Date: Wed, 16 Sep 2009 22:19:57 -0700
Local: Thurs, Sep 17 2009 1:19 am
Subject: Re: (A Possible Solution) Re: preferred way to set encoding for print
On Sep 16, 2009, at 12:39 PM, ~flow wrote:

>>> so: how can i tell python, in a configuration or using a setting in
>>> sitecustomize.py, or similar, to use utf-8 as a default encoding?

> [snip Stdout_writer_with_ncrs solution]

This should work:

     sys.stdout = io.TextIOWrapper(sys.stdout.buffer,
                                   encoding=sys.stdout.encoding,
                                   errors='xmlcharrefreplace')

http://mail.python.org/pipermail/python-list/2009-August/725100.html

-Miles


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »