Emacs text bug

drain

unread,

Jan 26, 2013, 3:23:47 PM1/26/13

to Help-gn...@gnu.org

Before I report this as a bug, I want to make sure it doesn't already have
a solution:

All of the "-" characters have been replaced with "\ 342\200\224" (which
has a different face and cannot be replaced with replace-string).

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577.html
Sent from the Emacs - Help mailing list archive at Nabble.com.

Peter Dyballa

unread,

Jan 26, 2013, 5:26:35 PM1/26/13

to drain, Help-gn...@gnu.org

Am 26.01.2013 um 21:23 schrieb drain:

> All of the "-" characters have been replaced with "\ 342\200\224" (which
> has a different face and cannot be replaced with replace-string).

Because the encoding of the buffer has changed? I can see similar things in one specific user's GNU Emacs. In *compilation* buffers the curly quotes are turned into their byte-triplets, in dired buffers the "ä" in the German name März for March are also sometimes lost. But why and when does this happen? Without this knowledge it's kind of senseless to report…

--
Greetings

Pete

The best way to accelerate a PC is 9.8 m/s²

drain

unread,

Jan 26, 2013, 5:43:08 PM1/26/13

to Help-gn...@gnu.org

Perhaps the encoding did change. I recall copy / pasting a bunch of text
from a book online into the buffer, and somewhere along the way I might
have blindly changed the setting.

Which encoding system supports the "—" character?

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276587.html

Drew Adams

unread,

Jan 26, 2013, 5:48:56 PM1/26/13

to Peter Dyballa, drain, Help-gn...@gnu.org

> But why and when does this happen? Without this knowledge

> it's kind of senseless to report.

I disagree with that claim.

While it is always better to base a bug report on more information, even just
reporting a problem can sometimes help. At the very least it gives Emacs core
developers and other users a heads-up to look further wrt the problem and its
details (e.g. "why and when").

That's already happening, because the OP posted here, thanks to your reply and
his followup wrt encoding.

Staying in one's corner because one does not have all the info or understanding
is too often a brake on progress.

Not every user has the motivation or the means, including time, to dig deeper
and investigate a problem encountered, to determine the why & when. Just
communicating that there seems to be a problem, even if one is not sure, is a
good start.

There is no way that Emacs developers can completely test every change they
make. Users reporting questions and perceived problems are indispensable to
getting it right.

IMHO, it is better for users, especially new users or those who feel unsure, to
err on the side of reporting too much than too little. It is definitely _not_
the case, IMO, that "it's kind of senseless to report" without knowledge of the
why & when.

The OP brought up the question here first, before reporting, in order to pose
ask whether he was missing something. That's a good thing. If the replies here
ultimately suggest that "it doesn't already have a solution", then I, for one,
encourage a bug report.

Peter Dyballa

unread,

Jan 26, 2013, 5:59:31 PM1/26/13

to drain, Help-gn...@gnu.org

Am 26.01.2013 um 23:43 schrieb drain:

> Which encoding system supports the "—" character?

You showed before that three bytes were used for the EM DASH' encoding, so it was done in UTF-8. (This character can also be encoded in CP125[0-2] and ISO 8859-1 – but then as 1 byte only.)

--
Greetings

Pete

Chicago, n.:
Where the dead still vote … early and often!

drain

unread,

Jan 26, 2013, 6:23:55 PM1/26/13

to Help-gn...@gnu.org

That was a bit tricky. The local buffer setting was "raw text", and I had
to change it to UTF-8. But the strings of codes were not automatically
converted (which would have been nice); I had to copy / paste the text into
the buffer again.

Is there a way to reload these characters once the encoding is changed? I
might have a few buffers like this, and it would save me copy / pasting
texts again. replace-string modus operandi would even work for me.

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276591.html

Peter Dyballa

unread,

Jan 26, 2013, 6:26:49 PM1/26/13

to Drew Adams, drain, Help-gn...@gnu.org

Am 26.01.2013 um 23:48 schrieb Drew Adams:

> While it is always better to base a bug report on more information, even just
> reporting a problem can sometimes help. At the very least it gives Emacs core
> developers and other users a heads-up to look further wrt the problem and its
> details (e.g. "why and when").

This happens as far as I can see rarely. Just some days ago it happened again and I was very soon there. C-h l did not show anything. While the compilation was still going on and showed UTF-8 encoding in the mode-line I tried to fix the way the buffer contents was presented by invoking revert-buffer-with-coding-system, C-x RET r, but it did not change anything. All other buffers (I visited) containing non-US ASCII characters showed the same fault: the UTF-8 encoding bytes were displayed.

This could be a Mac OS X problem. Here I can see that 'find … -ls' inserts ASCII NULs, ^@, into *shell* buffer at the transition from the column with the file size to the next one, the one with the date. Or it happens between the date column and the file name column – I am not completely sure about it. Something like these extra characters or bytes could be inserted into the *compilation* buffer as well and then the binary byte sequence gets out of sequence and order. But why does it hit all buffers and not only the faulty one with the extraneous bytes?

There seems to be one more indication: the hardware is PowerPC, 32-bit. The Mac OS X version is also close to ancient: Mac OS X 10.4 or 10.5 (Tiger or Leopard). On intel hardware it did occur yet…

--
Greetings

Pete

A blizzard is when it snows sideways.

Peter Dyballa

unread,

Jan 26, 2013, 6:29:24 PM1/26/13

to drain, Help-gn...@gnu.org

Am 27.01.2013 um 00:23 schrieb drain:

> Is there a way to reload these characters once the encoding is changed?

Yes: revert-buffer-with-coding-system or C-x RET r <encoding> RET

--
Greetings

Pete

Work is the curse of the drinking class.
– Oscar Wilde

drain

unread,

Jan 31, 2013, 12:55:52 PM1/31/13

to Help-gn...@gnu.org

Still problems.

(1) revert-buffer-with-coding system RET
(2) utf-8 RET
(3) "Revert buffer from file[...]" y RET
(4) [characters appear as they should now]
(5) [make change so I can save]
(6) save-buffer
(7) "Select coding system (default raw-text)" utf-8
(8) "wrote buffer [...]"
(9) kill-buffer RET foo.org RET
(10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
characters mangled.

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276925.html

Doug Lewan

unread,

Jan 31, 2013, 1:36:51 PM1/31/13

to drain, Help-gn...@gnu.org

> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
> characters mangled.

I think that's what you should expect. Once you kill the buffer, emacs forgets all about the file that it had held.

Apparently emacs can't figure out that the file is UTF-8. You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line is one way. You'll find more in the emacs info page, node `Coding Systems'.

I hope this helps.

,Douglas
Douglas Lewan
Shubert Ticketing
(201) 489-8600 ext 224

When I do good, I feel good. When I do bad, I feel bad and that's my religion. - Abraham Lincoln

drain

unread,

Jan 31, 2013, 1:45:31 PM1/31/13

to Help-gn...@gnu.org

Doug Lewan wrote

> You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line
> is one way.

That appears to have worked. A bit ugly having that instruction at the top,
but better than manually reverting the buffer every single time.

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276937.html

Eli Zaretskii

unread,

Jan 31, 2013, 1:52:56 PM1/31/13

to Help-gn...@gnu.org

> Date: Thu, 31 Jan 2013 09:55:52 -0800 (PST)
> From: drain <aeu...@gmail.com>

>
> Still problems.
>
> (1) revert-buffer-with-coding system RET
> (2) utf-8 RET
> (3) "Revert buffer from file[...]" y RET
> (4) [characters appear as they should now]
> (5) [make change so I can save]
> (6) save-buffer
> (7) "Select coding system (default raw-text)" utf-8
> (8) "wrote buffer [...]"
> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
> characters mangled.

Evidently, you have in that file bytes that are not valid UTF-8
sequences. You need to fix them (the "Select coding system ..."
prompt tells you which characters cannot be encoded in UTF-8 -- those
are the ones you need to fix.).

Eli Zaretskii

unread,

Jan 31, 2013, 2:08:50 PM1/31/13

to Help-gn...@gnu.org

> Date: Thu, 31 Jan 2013 10:45:31 -0800 (PST)
> From: drain <aeu...@gmail.com>
>

> Doug Lewan wrote
> > You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line
> > is one way.
>
> That appears to have worked. A bit ugly having that instruction at the top,
> but better than manually reverting the buffer every single time.

You shouldn't need that. You need to clean up your file instead.

drain

unread,

Jan 31, 2013, 2:28:47 PM1/31/13

to Help-gn...@gnu.org

Now I see. This problem must have started when I copied an early 19th
century letter into the buffer, and the characters did not transliterate
properly into modern English. Whatever those characters were, they turned
into circumflexed /a/ (â), the pound sign (£), and a (special) right double
quotation mark (”). utf-8 apparently cannot handle these.

But why would this prevent utf-8 from encoding the rest of the buffer? Why
not just leave those three characters mangled, and display the rest
properly? It reverted fine; it just would not stay in utf-8 unless I (1)
put the instruction at the top of the buffer or (2) deleted those special
characters. So the functionality appears to be there: Emacs just would not
accept it as a saved state (absent instruction at the top).

Somehow that buffer got stuck with a limited encoding system. I'm composing
this message right now in a "scratch.org" buffer which is using utf-8-unix
-- and apparently handles those three characters fine (consequently I'm
switching the problem file from utf-8 to utf-8-unix).

Anyway, glad to get that sorted.

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276942.html

Eli Zaretskii

unread,

Jan 31, 2013, 3:04:54 PM1/31/13

to Help-gn...@gnu.org

> Date: Thu, 31 Jan 2013 11:28:47 -0800 (PST)
> From: drain <aeu...@gmail.com>
>

> Now I see. This problem must have started when I copied an early 19th
> century letter into the buffer, and the characters did not transliterate
> properly into modern English. Whatever those characters were, they turned
> into circumflexed /a/ (â), the pound sign (£), and a (special) right double
> quotation mark (”). utf-8 apparently cannot handle these.

UTF-8 certainly _can_ handle them. I suspect that these characters
got copied as raw bytes instead.

> But why would this prevent utf-8 from encoding the rest of the buffer? Why
> not just leave those three characters mangled, and display the rest
> properly? It reverted fine; it just would not stay in utf-8 unless I (1)
> put the instruction at the top of the buffer or (2) deleted those special
> characters. So the functionality appears to be there: Emacs just would not
> accept it as a saved state (absent instruction at the top).

Emacs auto-detects the encoding each time you visit a file, unless
either the file (by the 'coding:' cookie) or you (by using "C-x RET c")
tell it exactly how to decode the file.