Encoding problem (accents with run script)

39 views
Skip to first unread message

wgw

unread,
Jun 30, 2011, 5:00:34 PM6/30/11
to leo-e...@googlegroups.com
This may not be a Leo problem (maybe a QT problem? I'm using Ubuntu 10.04), but Leo is the place that it crops up, so here it is:

I can type accents in the body pane without a problem. So I can write this:

import sys

e = sys.getdefaultencoding()
g.es('encoding',e)

g.es("test: é")

But when I run the script in Leo (Ctrl-B) I get an error:

exception executing script
  File "/home/bill/.leo/scriptFile.py", line 10
SyntaxError: Non-ASCII character '\xc3' in file /home/bill/.leo/scriptFile.py on line 10, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details (scriptFile.py, line 10)
--------------------
  line 9:
* line 10: g.es("test: é")
  line 11: #@-leo
  line 12:

(Interesting that the error message prints the accent in the log pane without a problème!.... :)

I get that error even when the line with the offending accent is commented out. And if I run the code after erasing the last line,  it prints the utf-8 encoding.

Under python, there is no problem (I have set the encoding in sitecustomize.py):

bill@bill-laptop:~/Soft/leo-editor$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> print "é"
é
 
And the workbook file is properly encoded: <?xml version="1.0" encoding="utf-8"?>. mysettings.leo too, as far as I can tell.

I have tried this both under screen and without screen (I use the leoscreen plugin) and  LC_ALL=en_US.UTF8  for all the LC.

I tried @encoding="utf-8"; no luck either.

So I am at a loss. Any suggestions?

Thanks!

wgw

unread,
Jun 30, 2011, 8:53:52 PM6/30/11
to leo-e...@googlegroups.com
Ok, this is the solution, which I suppose was obvious....

@first # -*- coding: utf-8 -*-

as the first line in the body.

Now I'm back on track.

So this goes into the enhancement bin: be able to set globally the default encoding... (But I thought that was done through leosettings.leo....)

wgw

unread,
Jun 30, 2011, 9:03:34 PM6/30/11
to leo-e...@googlegroups.com
PS:

I have the same problem with leoscreen, but of course that that explicit encoding solution won't work. Somewhere a global default encoding needs to be set.

So in leoscreen I can send

print "é"
print "e"

And it comes out fine in the python terminal, but I will get back into Leo with get_note :

>>> print "��>> print "e"

Edward K. Ream

unread,
Jul 4, 2011, 4:43:54 PM7/4/11
to leo-e...@googlegroups.com

This is a complex topic. If the Python settings sufficed, one would
think that the line:

# -*- coding: utf-8 -*-

would never be needed. But it is. That being so, I think the prudent
thing is to require the same line in Leo scripts.

Edward

wgw

unread,
Jul 4, 2011, 7:40:36 PM7/4/11
to leo-e...@googlegroups.com
Yes, encoding is a ...complex beast.

I have Googled a bit, and this post (http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/) gives the flavor of the problem:

Now I investigate a bit, Python’s behaviour is stubborn and arbitrary. When stdout is not a terminal then LC_CTYPE does not determine sys.stdout.encoding. A consequence is LC_CTYPE=en_GB.utf-8 python -c 'print u"\N{left-pointing double angle quotation mark}"' works, but piping it through tee gives a UnicodeEncodeError.

My guess is that somewhere in Leo (something used by the leoscreen plugin in particular)  there is sys.stdout but not sys.sdout.encoding. Or, more probably, it is piping to something that can't take the encoding from python, so python throws an error.

Tracking down such "errors" (it is really just interface issues between processes, I guess) is very difficult. (But I will keep my eye out...)

And there are sufficiently simple work-arounds. I mention it simply to warn others about a possible trap. As long as you are working in ASCII, life is simple!
Reply all
Reply to author
Forward
0 new messages