UTF-8 and old terminals (new startup banner)

352 views
Skip to first unread message

Volker Braun

unread,
Jul 7, 2013, 10:05:29 PM7/7/13
to sage-...@googlegroups.com
Frédéric Chapoton has written a patch at http://trac.sagemath.org/14733 that will beautify the Sage startup banner using some UTF-8 characters to draw the box. This will display incorrectly in terminals that do not support UTF-8. In that case, Sage still works but the box around the banner is garbled (most likely rendered by placeholder signs for non-ASCII characters).

Of course, various Sage source files are already UTF-8 encoded, usually because of non-ASCII characters in docstrings. These will never render correctly in ancient terminals, nor will editing such source files lead to much happiness. The questions is essentially, do we want to make it clear to the user right at the beginning that his terminal is not up to the task or would we rather only have more subtle errors later?

Working terminals:
  * xterm
  * urxvt (the rxvt-unicode fork)
  * gnome-terminal
  * kterm

Not working:
  * Eterm (unless your distro integrates inofficial patches that are floating around)
  * aterm
  * rxvt and various clones that predate rxvt-unicode

Ivan Andrus

unread,
Jul 8, 2013, 12:32:06 AM7/8/13
to sage-...@googlegroups.com
+1 to a UTF-8 banner.  

FWIW, Terminal.app and iTerm2.app (on OS X) also work.

-Ivan

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Nils Bruin

unread,
Jul 8, 2013, 4:51:28 AM7/8/13
to sage-...@googlegroups.com
On Monday, July 8, 2013 4:05:29 AM UTC+2, Volker Braun wrote:
Frédéric Chapoton has written a patch at http://trac.sagemath.org/14733 that will beautify the Sage startup banner using some UTF-8 characters to draw the box. This will display incorrectly in terminals that do not support UTF-8. In that case, Sage still works but the box around the banner is garbled (most likely rendered by placeholder signs for non-ASCII characters).

-1 to a UTF-8 banner. There's hardly anything in the sage command line that requires a UTF-8 capable terminal (all the colour stuff should shut off automatically for a "dumb" terminal). Why require it for *just* the banner?

More personally, I dislike the UTF-8 banner because it looks too nice. To me it doesn't fit with the simple prompt-and-return-value interface (see banners of magma, maple, R, matlab, pari/GP, python, IPython, GAP, Singular). The "graphics" looking lines are more suggestive of a menu-driven interface to me, like the text-based "dialog" interfaces. So to me, the "+-|"-built borders raise expectations more appropriate for what sage command line offers.

Julien Puydt

unread,
Jul 8, 2013, 4:55:52 AM7/8/13
to sage-...@googlegroups.com
Le 08/07/2013 10:51, Nils Bruin a �crit :
> On Monday, July 8, 2013 4:05:29 AM UTC+2, Volker Braun wrote:
>
> Fr�d�ric Chapoton has written a patch at
> http://trac.sagemath.org/14733 <http://trac.sagemath.org/14733> that
-1 too for the same reasons.

Keep the banner short and simple ; people don't come to sage to enjoy
that sight!

Snark on #sagemath

Thierry Dumont

unread,
Jul 8, 2013, 5:28:23 AM7/8/13
to sage-...@googlegroups.com
Le 08/07/2013 10:51, Nils Bruin a �crit :
> On Monday, July 8, 2013 4:05:29 AM UTC+2, Volker Braun wrote:
>
> Fr�d�ric Chapoton has written a patch at
> http://trac.sagemath.org/14733 <http://trac.sagemath.org/14733> that
Is the banner of R so simple? Probably it does not use UTF8, but it
speakes your mother's language (French in my case)...
tdumont.vcf

Jean-Pierre Flori

unread,
Jul 8, 2013, 5:33:18 AM7/8/13
to sage-...@googlegroups.com
IIRC the new GAP banner uses UTF-8.

On Monday, July 8, 2013 11:28:23 AM UTC+2, tdumont wrote:
Le 08/07/2013 10:51, Nils Bruin a �crit :
> On Monday, July 8, 2013 4:05:29 AM UTC+2, Volker Braun wrote:
>
>     Fr�d�ric Chapoton has written a patch at

vdelecroix

unread,
Jul 8, 2013, 6:46:40 AM7/8/13
to sage-...@googlegroups.com
If we allow an UTF-8 banner then we may also allow UTF-8 string representations for Sage objects (why not?). But I think we do not want to force the user to have UTF-8 output because it is always harder to parse an UTF-8 string than an ASCII string (this argument is also valid for the Sage banner). It might be useful to have two console modes, ASCII and unicode, and I have no objection of having unicode as default. Finally, I strongly believe that we do not want to change the sage-banner at each release.

Vincent

Nils Bruin

unread,
Jul 8, 2013, 6:46:42 AM7/8/13
to sage-...@googlegroups.com


On Monday, July 8, 2013 11:33:18 AM UTC+2, Jean-Pierre Flori wrote:
IIRC the new GAP banner uses UTF-8.
 
You're right. With (to me) the same jarring effect, so I don't like it.

Jean-Pierre Flori

unread,
Jul 8, 2013, 7:35:18 AM7/8/13
to sage-...@googlegroups.com
That was also a problem to me when I updated some Sage's doc and  latex or maybe Sphinx did not like the strange UTF-8 chars used.

Volker Braun

unread,
Jul 8, 2013, 9:06:16 AM7/8/13
to sage-...@googlegroups.com
On Monday, July 8, 2013 4:51:28 AM UTC-4, Nils Bruin wrote:
all the colour stuff should shut off automatically for a "dumb" terminal).

The colors are ansi escape sequences, they have nothing to do with unicode for the record
 
Why require it for *just* the banner?

As I said already, we *already*  require it for display of non-ascii characters in docstrings.
 
More personally, I dislike the UTF-8 banner because it looks too nice.

These kids nowadays, they have it to easy. When I was young, we had to walk an hour through the snow to school. And uphill. In both directions ;-)

Volker Braun

unread,
Jul 8, 2013, 9:17:15 AM7/8/13
to sage-...@googlegroups.com
On Monday, July 8, 2013 6:46:40 AM UTC-4, vdelecroix wrote:
If we allow an UTF-8 banner then we may also allow UTF-8 string representations for Sage objects (why not?).

Kind of off topic, but I think there is no doubt that we will be using UTF-8 for string representations at some point in the future. It would be stupid for a mathematics project to not take advantage of all the mathematical symbol codepoints in unicode.

Having said that, it is also true that there are still non-UTF8 capable terminals in the wild. So until we have an idea of how many people are using them, we shouldn't require them to actually use Sage. But the banner and the docstrings aren't really crucial to using Sage.
 
But I think we do not want to force the user to have UTF-8 output because it is always harder to parse an UTF-8 string than an ASCII string (this argument is also valid for the Sage banner).

I disagree, the difficulty right now is that you often get a mix of byte strings with various encodings so its not obvious what to do. But Python 3 has only unicode strings, so basically there will be no such thing as an ASCII string. String operations are then naturally unicode and just as easy or difficult as if everything were ASCII. 

Robert Bradshaw

unread,
Jul 8, 2013, 5:05:20 PM7/8/13
to sage-devel
On Mon, Jul 8, 2013 at 1:55 AM, Julien Puydt <julien...@laposte.net> wrote:
Le 08/07/2013 10:51, Nils Bruin a écrit :
On Monday, July 8, 2013 4:05:29 AM UTC+2, Volker Braun wrote:

    Frédéric Chapoton has written a patch at

I agree, no need to have fancy unicode here. If you're doing something that requires nice output you should probably be using a notebook interface anyways; the CLI interface just isn't going to go there and going part way is worse than just keeping things simple.  

- Robert


Dima Pasechnik

unread,
Jul 10, 2013, 10:27:08 AM7/10/13
to sage-...@googlegroups.com
On 2013-07-08, Volker Braun <vbrau...@gmail.com> wrote:
> ------=_Part_5933_16822679.1373249129839
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: quoted-printable
>
> Fr=E9d=E9ric Chapoton has written a patch at http://trac.sagemath.org/14733=
>=20
> that will beautify the Sage startup banner using some UTF-8 characters to=
>=20
> draw the box. This will display incorrectly in terminals that do not=20
> support UTF-8. In that case, Sage still works but the box around the banner=
>=20
> is garbled (most likely rendered by placeholder signs for non-ASCII=20
> characters).
>
> Of course, various Sage source files are already UTF-8 encoded, usually=20
> because of non-ASCII characters in docstrings. These will never render=20
> correctly in ancient terminals, nor will editing such source files lead to=
>=20
> much happiness. The questions is essentially, do we want to make it clear=
>=20
> to the user right at the beginning that his terminal is not up to the task=
>=20
> or would we rather only have more subtle errors later?

ssh from strange terminals often garbles any kind of ascii art, leave
alone being UTF-8 clean etc.
I've been doing "export TERM=vt100" much too much to trust
these things.
-1 to the UTF-8 banner, sorry...

Dima

Volker Braun

unread,
Jul 10, 2013, 12:01:54 PM7/10/13
to sage-...@googlegroups.com
On Wednesday, July 10, 2013 10:27:08 AM UTC-4, Dima Pasechnik wrote:
ssh from strange terminals often garbles any kind of ascii art, leave
alone being UTF-8 clean etc.
I've been doing "export TERM=vt100" much too much to trust
these things.

Do you actually have an example or are you just hypothesizing? TERM settings and UTF-8 are completely orthogonal issues. In particular, the UTF-8 bytestream cannot be mis-interpreted as an ansi escape sequence by dumb terminals, which is one of the reasons for using UTF-8 in the first place.

Dima Pasechnik

unread,
Jul 10, 2013, 5:09:26 PM7/10/13
to sage-...@googlegroups.com
On 2013-07-10, Volker Braun <vbrau...@gmail.com> wrote:
> ------=_Part_2978_26839636.1373472114217
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wednesday, July 10, 2013 10:27:08 AM UTC-4, Dima Pasechnik wrote:
>
>> ssh from strange terminals often garbles any kind of ascii art, leave
>> alone being UTF-8 clean etc.
>> I've been doing "export TERM=vt100" much too much to trust
>> these things.
>>
>
> Do you actually have an example or are you just hypothesizing?

well, it gives me a headache to even think of what happens when one does
ssh to a Windows host, running Sage with Cygwin, and how different
Cygwin terminals behave wrt UTF-8. Or what happens when one runs a
weird ssh implementation on Windows to connect to a Linux host running
Sage, which is a common scenario...

Volker Braun

unread,
Jul 10, 2013, 10:48:20 PM7/10/13
to sage-...@googlegroups.com
Since evidently nobody who weighted in so far has actually tried the patch, here are screenshots of a few different terminals:

http://boxen.math.washington.edu/home/vbraun/UTF8/

In particular, Putty works beautifully if you set it to UTF-8. And if you don't then you can still use Sage just fine, but it'll be clear from the banner that you are using the wrong encoding.  


On Wednesday, July 10, 2013 5:09:26 PM UTC-4, Dima Pasechnik wrote:
well, it gives me a headache to even think of what happens when one does
ssh to a Windows host, running Sage with Cygwin, and how different
Cygwin terminals behave wrt UTF-8.

Again, the UTF-8 bytestream cannot be misinterpreted as terminal escape sequence, this is one of the main features of UTF-8. If your terminal doesn't know about UTF-8 then it'll display a high bit character (0x80 - 0xff) which will generally be some accented letter. By design, this is the worst that can happen if you have a broken terminal.

 

Jeroen Demeyer

unread,
Jul 25, 2013, 2:05:38 PM7/25/13
to sage-...@googlegroups.com
On 07/08/2013 04:05 AM, Volker Braun wrote:
> Of course, various Sage source files are already UTF-8 encoded, usually
> because of non-ASCII characters in docstrings. These will never render
> correctly in ancient terminals, nor will editing such source files lead
> to much happiness.
Just want to point out this is not really true. Editors like vim for
example can correctly transcode between file and terminal encodings. If
your terminal is iso-8859-1 for example but you're editing a utf-8 file
containing accented vowels, those will display correctly.

Volker Braun

unread,
Jul 25, 2013, 5:15:56 PM7/25/13
to sage-...@googlegroups.com
On Thursday, July 25, 2013 2:05:38 PM UTC-4, Jeroen Demeyer wrote:
On 07/08/2013 04:05 AM, Volker Braun wrote:
> Of course, various Sage source files are already UTF-8 encoded, 
Editors like vim for
example can correctly transcode between file and terminal encodings. If
your terminal is iso-8859-1 for example but you're editing a utf-8 file
containing accented vowels, those will display correctly.

Thats not entirely true either. You can set your editor to dumb it down for the terminal to display, but the editor has no way to find out what the terminal supports. Vim and emacs default to pass-through, letting the terminal deal with whatever is in the source file. Those in the know can then press C-x RET t <encoding> to tell emacs what the terminal supports, but if you know *that* then I'm quite confident that you have switched long ago to a terminal that supports UTF-8.  

I'm still in favor of using UTF-8 for the banner. So far, the only argument against that has been brought forward is "it looks too nice". I want my punch cards back. Get of my lawn, kids!

William Stein

unread,
Jul 26, 2013, 12:52:08 AM7/26/13
to sage-...@googlegroups.com
I used the patch, and I think it is beautiful. I completely disagree
with the comments such as " I dislike the UTF-8 banner because it
looks too nice." and "Keep the banner short and simple ; people don't
come to sage to enjoy that sight!" Clean beauty is exactly what
people (at least me!) want in software. The banner in Sage right now,
which I probably wrote (?), looks frankly ugly and like a hack,
compared to the one on this patch.

Also, UTF is clearly the future of strings, having native default
support in modern interpreters, editors, etc., and also being critical
to supporting users who aren't using English.

This patch is along the same lines as the recent inclusion of a nice
color prompt (thanks Volker) in that it makes Sage prettier and more
pleasant to use.

So my strong vote *for* this ticket. Moreover, I like it so much I'll
be henceforth applying it to the standard system-wide version of Sage
at https://cloud.sagemath.com, even if it doesn't get into Sage.
In particular, I disagree with " If you're doing something that
requires nice output you should probably be using a notebook interface
anyways" -- since the terminal interface *is* part of the notebook
interface now, and it must look nice.

-- William

Robert Bradshaw

unread,
Jul 26, 2013, 3:02:41 PM7/26/13
to sage-devel
At the very least, lets be careful to avoid fancy invisible unicode
characters: https://groups.google.com/forum/#!topic/sage-devel/LjC75cae7XI

Volker Braun

unread,
Jul 26, 2013, 3:26:23 PM7/26/13
to sage-...@googlegroups.com
Although (perhaps?) surprising, some languages don't have spaces and require zero-width space to designate word boundaries for line breaks.

Keshav Kini

unread,
Aug 2, 2013, 2:51:29 PM8/2/13
to sage-...@googlegroups.com
William Stein <wst...@gmail.com> writes:
> Also, UTF is clearly the future of strings, having native default
> support in modern interpreters, editors, etc., and also being critical
> to supporting users who aren't using English.

Maybe one day Sage will support something like agda-input-method in
Emacs :)

-Keshav

Volker Braun

unread,
Aug 2, 2013, 9:49:21 PM8/2/13
to sage-...@googlegroups.com
Its always hard to please everybody, We haven't really reached any consensus, nor does it sound like there is much to be gained by further discussion. So i'll set the ticket to positive review and leave you with Lichtenberg's aphorism: 

Ich weiss nicht, ob es besser wird, wenn es anders wird. Aber es muss anders werden, wenn es besser werden soll.

(I don't know if it'll become better if it is changed. But it must be changed to become better.)



Bill Janssen

unread,
Jun 4, 2015, 10:39:53 AM6/4/15
to sage-...@googlegroups.com
It's hard to sufficiently emphasize how ugly this looks in an Emacs shell buffer.  I use the Inconsolata font with emacs, and it doesn't have those characters (the dashes and corner characters), so Emacs goes to a different fixed-width font for them, and that font has a different width, so the horizontal dashes are about twice the width of the text lines!  I just turn it off by setting SAGE_BANNER to "no".

Bill
Reply all
Reply to author
Forward
0 new messages