Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Emacs 24.5.1 has wrong charset on yanked text with Windows (was: Operating on yanked region)

16 views
Skip to first unread message

Karl Voit

unread,
May 18, 2016, 8:14:22 AM5/18/16
to help-gn...@gnu.org
* Yuri Khan <yuri....@gmail.com> wrote:
> On Wed, May 18, 2016 at 5:07 PM, Karl Voit <dev...@karl-voit.at> wrote:
>
>> When I paste text from a Windows application to my Emacs (running
>> mainly Org-mode), I get wrong charset.
>>
>> For correcting such wrong character sets, I want to write a function
>> that corrects the charset.
>
> I cannot help with your needs,

That's OK.

Besides the good points regarding charset below, I still need to
search&replace some things on the yanked region.

> but you should not have to jump through
> hoops just to get text encoding right, as long as the originating
> application puts Unicode text on clipboard. (And I’m pretty convinced
> Outlook does.)

How can I test this?

> If Emacs on Windows takes encoded text from clipboard when Unicode is
> available, it’s a bug in Emacs.

I am using a pre-compiled GNU Emacs 24.5.1 (i686-pc-mingw32) of
2015-04-11 on LEG570.

Yes, I'd prefer when Emacs yanks the text in the correct charset
which it doesn't at my side.

Can somebody confirm this wrong behaviour?

How can I help?

--
All in all, one of the most disturbing things today is the definitive
fact that the NSA, GCHQ, and many more government organizations are
massively terrorizing the freedom of us and the next generations.
http://Karl-Voit.at


to...@tuxteam.de

unread,
May 18, 2016, 8:31:31 AM5/18/16
to help-gn...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, May 18, 2016 at 02:13:52PM +0200, Karl Voit wrote:
> * Yuri Khan <yuri....@gmail.com> wrote:
> > On Wed, May 18, 2016 at 5:07 PM, Karl Voit <dev...@karl-voit.at> wrote:
> >
> >> When I paste text from a Windows application to my Emacs (running
> >> mainly Org-mode), I get wrong charset.
> >>
> >> For correcting such wrong character sets, I want to write a function
> >> that corrects the charset.
> >
> > I cannot help with your needs,
>
> That's OK.
>
> Besides the good points regarding charset below, I still need to
> search&replace some things on the yanked region.
>
> > but you should not have to jump through
> > hoops just to get text encoding right, as long as the originating
> > application puts Unicode text on clipboard. (And I’m pretty convinced
> > Outlook does.)
>
> How can I test this?

I know very little about Windows these days. But the documentation
for the variables "selection-coding-system" and "x-select-request-type"
might contain relevant information. You can access this doc with

C-h v <variable name>

Hope this gets you a step further

- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlc8YJUACgkQBcgs9XrR2kb3cwCeIzxVtUd+OcO/hID1vJRayBbN
ukkAn1Upbc1Rap6NCO22+KiyrbrenpHv
=EEwS
-----END PGP SIGNATURE-----

Yuri Khan

unread,
May 18, 2016, 9:26:05 AM5/18/16
to Karl Voit, help-gn...@gnu.org
On Wed, May 18, 2016 at 6:13 PM, Karl Voit <dev...@karl-voit.at> wrote:

>> but you should not have to jump through
>> hoops just to get text encoding right, as long as the originating
>> application puts Unicode text on clipboard. (And I’m pretty convinced
>> Outlook does.)
>
> How can I test this?

Something like this (Note: Little Programs do little to no error checking):

#include <stdio.h>
#include <Windows.h>

int main()
{
UINT format = 0;
OpenClipboard(0);
do
{
format = EnumClipboardFormats(format);
printf("%d\n", format);
} while (format);
CloseClipboard();
}

Unicode text is format 13. ANSI text is format 1. OEM text is format
7. Formats are enumerated in decreasing order of preference, i.e. most
faithful representation first.

Karl Voit

unread,
May 18, 2016, 11:54:29 AM5/18/16
to help-gn...@gnu.org
* <to...@tuxteam.de> <to...@tuxteam.de> wrote:
>
> On Wed, May 18, 2016 at 02:13:52PM +0200, Karl Voit wrote:
>> * Yuri Khan <yuri....@gmail.com> wrote:
>> > On Wed, May 18, 2016 at 5:07 PM, Karl Voit <dev...@karl-voit.at> wrote:
>> >
>> >> When I paste text from a Windows application to my Emacs (running
>> >> mainly Org-mode), I get wrong charset.
>> >>
>> >> For correcting such wrong character sets, I want to write a function
>> >> that corrects the charset.
>> >
>> > I cannot help with your needs,
>>
>> That's OK.
>>
>> Besides the good points regarding charset below, I still need to
>> search&replace some things on the yanked region.
>>
>> > but you should not have to jump through
>> > hoops just to get text encoding right, as long as the originating
>> > application puts Unicode text on clipboard. (And I’m pretty convinced
>> > Outlook does.)
>>
>> How can I test this?
>
> I know very little about Windows these days. But the documentation
> for the variables "selection-coding-system" and

,----
| selection-coding-system is a variable defined in `select.el'.
| Its value is iso-latin-1-dos
| Original value was nil
`----

I once set this to 'utf-8 in my init.el. The Emacs help further
suggests setting it to 'utf-16le-dos instead when running on
Windows. With switching to this setting, the clipboard gets yanked
properly! :-)

> "x-select-request-type" might contain relevant information.

(UTF8_STRING COMPOUND_TEXT TEXT STRING)
... seems legit.

Great, that solves the charset issue.

Thanks very much! :-)


Can you still show me how I yank and operate (string-replace) only
on the yanked text?

Eli Zaretskii

unread,
May 18, 2016, 2:14:35 PM5/18/16
to help-gn...@gnu.org
> From: Karl Voit <dev...@Karl-Voit.at>
> Date: Wed, 18 May 2016 17:53:51 +0200
>
> ,----
> | selection-coding-system is a variable defined in `select.el'.
> | Its value is iso-latin-1-dos
> | Original value was nil
> `----
>
> I once set this to 'utf-8 in my init.el. The Emacs help further
> suggests setting it to 'utf-16le-dos instead when running on
> Windows. With switching to this setting, the clipboard gets yanked
> properly! :-)

You shouldn't set selection-coding-system to any value on MS-Windows,
just leave it alone. It will work regardless. Setting it to anything
is asking for trouble, as it forces Emacs to use that encoding instead
of using UTF-16 when available in the clipboard.

> Can you still show me how I yank and operate (string-replace) only
> on the yanked text?

Turn on transient-mark-mode, I think.

Karl Voit

unread,
May 18, 2016, 4:31:13 PM5/18/16
to help-gn...@gnu.org
* Eli Zaretskii <el...@gnu.org> wrote:
>> From: Karl Voit <dev...@Karl-Voit.at>
>> Date: Wed, 18 May 2016 17:53:51 +0200
>>
>> ,----
>> | selection-coding-system is a variable defined in `select.el'.
>> | Its value is iso-latin-1-dos
>> | Original value was nil
>> `----
>>
>> I once set this to 'utf-8 in my init.el. The Emacs help further
>> suggests setting it to 'utf-16le-dos instead when running on
>> Windows. With switching to this setting, the clipboard gets yanked
>> properly! :-)
>
> You shouldn't set selection-coding-system to any value on MS-Windows,
> just leave it alone. It will work regardless. Setting it to anything
> is asking for trouble, as it forces Emacs to use that encoding instead
> of using UTF-16 when available in the clipboard.

Before I set it to 'utf-16le-dos as suggested by the manual for
Windows, I tested with no value and it did not yank correctly.

Eli Zaretskii

unread,
May 19, 2016, 12:28:46 AM5/19/16
to help-gn...@gnu.org
> From: Karl Voit <dev...@Karl-Voit.at>
> Date: Wed, 18 May 2016 22:30:47 +0200
>
> > You shouldn't set selection-coding-system to any value on MS-Windows,
> > just leave it alone. It will work regardless. Setting it to anything
> > is asking for trouble, as it forces Emacs to use that encoding instead
> > of using UTF-16 when available in the clipboard.
>
> Before I set it to 'utf-16le-dos as suggested by the manual for
> Windows, I tested with no value and it did not yank correctly.

Don't you see it being set to that value in "emacs -Q"? That's what I
see here.

to...@tuxteam.de

unread,
May 19, 2016, 4:38:56 AM5/19/16
to help-gn...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, May 18, 2016 at 05:53:51PM +0200, Karl Voit wrote:
> * <to...@tuxteam.de> <to...@tuxteam.de> wrote:

[...]

> I once set this to 'utf-8 in my init.el. The Emacs help further
> suggests setting it to 'utf-16le-dos instead when running on
> Windows. With switching to this setting, the clipboard gets yanked
> properly! :-)

That's interesting, since Eli says it shouldn't be necessary on
Windows (and he sure knows a hell of a lot more about Emacs than
I do, and much more so specifically about Emacs on windows).

That'd mean that your Emacs is confused somehow, but why?

[...]

> Can you still show me how I yank and operate (string-replace) only
> on the yanked text?

I don't know exactly what you want to achieve (manual operation, or
ultimately some automatism?), but you might start here:

- after a (normal) yank, the last mark is at the start of the
yanked text and point at its end (but mark is not active).
So if you activate it, e.g. by

M-x eval-expression RET (activate-mark) RET

you get the just yanked stuff "selected". You'll have to
wrap some of that into commands to make it practical, though.
Season to taste.

- there isn't, AFAIK, a hook hanging off the yank event itself
(a pity, IMHO), but if you somehow manage to attach the text
property named 'yank-handler (having as value a function +
arg provided by you), then this function gets the chance to
do its thing just after yanking.

Search for "yank-handler" in the Emacs Lisp manual. I'm a
bit pressed now, but if you nudge me I'd be willing to whip
up an example.

regards
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlc9e4sACgkQBcgs9XrR2kYZOwCfQpAsCcNfDAh7YXGXMTAJIJMo
0c4AnRE/GyKY0fCri6uoQhQS+OcUNa/a
=6S6s
-----END PGP SIGNATURE-----

Karl Voit

unread,
Jun 9, 2016, 6:11:18 AM6/9/16
to help-gn...@gnu.org
* <to...@tuxteam.de> <to...@tuxteam.de> wrote:
>
> On Wed, May 18, 2016 at 05:53:51PM +0200, Karl Voit wrote:
>> * <to...@tuxteam.de> <to...@tuxteam.de> wrote:
>
> [...]
>
>> I once set this to 'utf-8 in my init.el. The Emacs help further
>> suggests setting it to 'utf-16le-dos instead when running on
>> Windows. With switching to this setting, the clipboard gets yanked
>> properly! :-)
>
> That's interesting, since Eli says it shouldn't be necessary on
> Windows (and he sure knows a hell of a lot more about Emacs than
> I do, and much more so specifically about Emacs on windows).
>
> That'd mean that your Emacs is confused somehow, but why?

I just answered to Elis posting which should also answer the
question of my (faulty) configuration.

> [...]
>
>> Can you still show me how I yank and operate (string-replace) only
>> on the yanked text?
>
> I don't know exactly what you want to achieve (manual operation, or
> ultimately some automatism?), but you might start here:

What I want to achieve (I should have started with this one in the
first place): yanked text from Outlook had wrong charset (fixed!)
and a different syntax for list items. I am trying to automate it so
that I can paste to Org-mode and get Org-mode syntax for list items.
Therefore I want to search&replace within the yanked text to look
for Outlook bullet point snippets and replace them accordingly.

> - after a (normal) yank, the last mark is at the start of the
> yanked text and point at its end (but mark is not active).
> So if you activate it, e.g. by
>
> M-x eval-expression RET (activate-mark) RET
>
> you get the just yanked stuff "selected". You'll have to
> wrap some of that into commands to make it practical, though.
> Season to taste.

Wow, this is great news. Thanks!

> - there isn't, AFAIK, a hook hanging off the yank event itself
> (a pity, IMHO), but if you somehow manage to attach the text
> property named 'yank-handler (having as value a function +
> arg provided by you), then this function gets the chance to
> do its thing just after yanking.
>
> Search for "yank-handler" in the Emacs Lisp manual. I'm a
> bit pressed now, but if you nudge me I'd be willing to whip
> up an example.

Cool help!

I am not sure if I want to modify yanking in general. I was thinking
of defining my-outlook-yank or similar that does the additional
stuff.

to...@tuxteam.de

unread,
Jun 9, 2016, 6:18:36 AM6/9/16
to help-gn...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Jun 09, 2016 at 12:10:57PM +0200, Karl Voit wrote:
> * <to...@tuxteam.de> <to...@tuxteam.de> wrote:

[...]

> Cool help!

:-)

I've been helped so often here: glad to be of some help, then.

> I am not sure if I want to modify yanking in general. I was thinking
> of defining my-outlook-yank or similar that does the additional
> stuff.

Sounds reasonable

regards
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAldZQm0ACgkQBcgs9XrR2kZudACggV18cVVwUWUFIYaDr/UYvUXB
o0kAn2dKIixwj2SErV8MTZYC7vFqxopF
=U6a9
-----END PGP SIGNATURE-----

0 new messages