Unicode->ASCII in copy/paste

369 views
Skip to first unread message

Lauren Weinstein

unread,
May 1, 2014, 1:42:46 PM5/1/14
to chromiu...@chromium.org

Greetings. I frequently need to copy small segments of text from web
pages to ASCII files (via secure shell). Is there any way to
automatically convert the most common punctuation marks in this
process? As it stands now, I have to manually read through and correct
each segment if it contains apostrophes or dashes, which paste through
as lowercase "b" currently. Thanks!

--Lauren--
Lauren Weinstein (lau...@vortex.com): http://www.vortex.com/lauren
Co-Founder: People For Internet Responsibility: http://www.pfir.org/pfir-info
Founder:
- Network Neutrality Squad: http://www.nnsquad.org
- PRIVACY Forum: http://www.vortex.com/privacy-info
Member: ACM Committee on Computers and Public Policy
Lauren's Blog: http://lauren.vortex.com
Google+: http://google.com/+LaurenWeinstein
Twitter: http://twitter.com/laurenweinstein
Tel: +1 (818) 225-2800 / Skype: vortex.com

Robert Ginda

unread,
May 1, 2014, 1:49:49 PM5/1/14
to Lauren Weinstein, chromium-hterm
You're copying unicode characters from Chrome and pasting them into an editor running in Secure Shell?  Can you create a search/replace macro for your editor that takes care of the punctuation you care about?

Doing this in a generic way in Secure Shell doesn't sound like a good idea.  We'd need a special "transform, then paste" action, and probably a way to configure the transformation.


Rob.



--
You received this message because you are subscribed to the Google Groups "chromium-hterm" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-hterm/20140501174246.GC3033%40vortex.com.

Lauren Weinstein

unread,
May 1, 2014, 2:13:11 PM5/1/14
to Robert Ginda, chromium-hterm

Hmm. No, macros are not a viable option in this case. With simple
telnet terminal emulators (some of them quite old) this has never come
up as a problem before, oddly enough. Thanks.

--Lauren--

Robert Ginda

unread,
May 1, 2014, 2:31:18 PM5/1/14
to Lauren Weinstein, chromium-hterm
ssh doesn't really add anything to the mix that would change this behavior.  If you can copy/paste a Unicode apostrophe into a text-based editor and have it do what you expect in telnet, it should also work in Secure Shell.  You may have found a bug, or your encoding may not be properly configured on the host.

Can you give concrete steps to reproduce the issue?

Lauren Weinstein

unread,
May 1, 2014, 3:16:47 PM5/1/14
to Robert Ginda, chromium-hterm

OK, here's a specific example:

http://www.wired.com/2014/05/license-plate-tracking/

If that page is displayed in Chrome and the text "It's no secret that"
copied on a win7 machine, when you paste the text (into pretty much
anything) it is correctly pasted the same way (even into an old ASCII
text editor via telnet).

But it's different through hterm. The actual source is:

It’s no secret that

This shows up when pasted through hterm as:

Itbps no secret that

Thanks.

--Lauren--

Robert Ginda

unread,
May 1, 2014, 3:34:40 PM5/1/14
to Lauren Weinstein, chromium-hterm
That works ok for me on my Ubuntu and OS X machines.  Unfortunately I don't have a windows machine handy to check there.   Can you make sure that your LANG environment variable on the host is set to "en_US.UTF-8"?

Also double check that `curl -s https://raw.githubusercontent.com/libapps/libapps-mirror/master/hterm/test_data/utf-8.txt` properly displays the contents of the file in Secure Shell.

Maybe there's a paste-utf-8-in-windows specific bug here.


Rob.

Lauren Weinstein

unread,
May 1, 2014, 4:49:00 PM5/1/14
to Robert Ginda, chromium-hterm

It looks like some of the path involved is not UTF-8 clean (Windows was
just for testing, the actual case is all Linux systems). Windows appears
to be automatically converting from UTF-8 on a paste operation so that
just works. But going through SSH it's going to require a manual iconv
operation because some of the code cannot reasonably be converted for
UTF-8 at this time.

Thanks for your help with this!

--Lauren--

Wayne Davison

unread,
Mar 11, 2015, 1:44:37 PM3/11/15
to chromiu...@chromium.org
Yeah, this bug is really annoying.  I am in a crouton chroot with UTF-8 enabled, and if I double-click a unicode character in the shell window and then press Ctrl-Shift-V, it pastes as "encoded" utf-8.  I can run the text through a decode script, such as the following perl snippet:

perl -e 'undef $/; $_ = <>; utf8::decode($_); print'

... which converts the characters to proper unicode.  It would sure be nice if there were an option to do that utf8::decode() step on all pasted content.
Reply all
Reply to author
Forward
0 new messages