Unicode Problems

368 views
Skip to first unread message

Paul Walger

unread,
May 1, 2014, 2:00:18 PM5/1/14
to chromiu...@chromium.org
Hello,

i don't know since when, but hterm has some problems with displaying unicode characters. 
I tried the normal "Secure Shell" and "Secure Shell (dev)" version 0.8.27, both have the same problem.

How to reproduce: 
Open shell with Ctrl+Alt+T and type äö and you will see "äö"

So played around with nassh_deps.concat.js and by changing Line 12827
"return this.decodeUTF8(str);" to ""return this.decodeUTF8(this.decodeUTF8(str));" the problem disappeared.

/**
 * Decode a string according to the 'receive-encoding' preference.
 */
hterm.VT.prototype.decode = function(str) {
  if (this.characterEncoding == 'utf-8')
    return this.decodeUTF8(this.decodeUTF8(str));
  return str;
};

Robert Ginda

unread,
May 1, 2014, 2:06:41 PM5/1/14
to Paul Walger, Toni Barzic, chromium-hterm
(+tbarzic)

Ctrl+Alt+T opens crosh, which is a custom chrome API communicate with the local "crosh" command on Chrome OS.  From your diagnosis, it sounds like maybe the encoding conventions for this channel changed in Chrome itself.

Toni is looking into some unicode troubles with crosh this week, maybe he'll have an idea what's gone wrong.

If you're using crosh only for the ssh command, try using Secure Shell directly instead.


Rob.


--
You received this message because you are subscribed to the Google Groups "chromium-hterm" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-hterm/a68487a0-c781-4c25-a25d-45b062dfc752%40chromium.org.

Toni Barzic

unread,
May 1, 2014, 6:41:15 PM5/1/14
to Robert Ginda, Paul Walger, chromium-hterm
The problem occurs when sending keyboard input to terminalPrivate API.
In Terminal.prototype.onVTKeystroke the keystroke is UTF8 encoded (because keyboard.characterEncoding is set to 'utf-8') before passing it to io.onVTKeystroke implementation, which calls chrome.terminalPrivate.sendInput. The problem is that the extension system expects UTF16 string and it encodes the received string to UTF8 (again) before SendInput extension function implementation receives it.
So, e.g. for 'č' (0xc4 0x8d in UTF8) becomes 0xc3 0x84 0xc20 x8d, because 0xc4 UTF16 is 0xc3 0x84 UTF8 and 0x8d UTF16 is 0xc2 0x8d UTF8.
When the input is echoed back, crosh extension API converts UTF8 passed from crosh process to UTF16 and we get 0xc4 0x8d UTF16; or Ä [a control character].

Maybe we should force keyboard.characterEncoding value to be 'raw' for crosh?


Robert Ginda

unread,
May 1, 2014, 6:55:27 PM5/1/14
to Toni Barzic, Paul Walger, chromium-hterm
We could overwrite the character encoding for crosh, bypassing the prefs, but if the pref ever changed while crosh was running it would get set back.

I think since we're only dealing with keyboard input, we can pay the cost of decoding the UTF-8 encoded on its way to terminalPrivate, iif the output encoding is set to 'utf-8'.

Toni Barzic

unread,
May 1, 2014, 7:02:07 PM5/1/14
to Robert Ginda, Paul Walger, chromium-hterm
yeah, that should also be fine.

brettm...@gmail.com

unread,
May 5, 2014, 11:57:40 AM5/5/14
to chromiu...@chromium.org
thanks!
Reply all
Reply to author
Forward
0 new messages