Unknown character encodings in .emacs

Alan E. Davis

unread,

Jun 7, 2008, 9:32:32 AM6/7/08

to help-gn...@gnu.org

I have stumbled on a problem in my ~/.emacs.el . I am at a loss to even explain it. It seems to me like it might be related to character encoding.

For a number of months, or even years, I have been encountering messages asking what encoding I wish to use to save files. I have gotten into the habit of saying "utf-8" because it gets me out of there, but I haven't a clue what this might do or mean.

Now, my .emacs has gotten buggered. I pasted some snippets from a web site. Later one, those snippets, it seems, have gotten expressed as garbage characters. I'be been poking around, but don't have a clue where to start. I never did understand encoding---I mean, what it does, what parts of the sysem are involved. I cannot even get to first base debugging this.

As an example, I have an abbreviation table that has been in my init file for 15 years, and now reads as gibberish.

(define-abbrev-table 'global-abbrev-table '(
    ("%`" "‚\°" nil 0)
    ("a`" "ÃƒÂ¡" nil 1)
    ("A`" "ÃƒÂ " nil 1)
    ("o`" "ÃƒÂ³" nil 2)
    ("O`" "ÃƒÂ"" nil 1)
    ("u`" "ÃƒÂº" nil 2)
    ("U`" "ÃƒÂš" nil 1)
    ("n`" "ÃƒÂ±" nil 0)
    ("e`" "ÃƒÂ©" nil 4)
    ("E`" "ÃƒÂ‰" nil 1)
    ("m`" "Ã‚Âµ" nil 0)
    ("p`" "Ã‚Â¶" nil 0)
    ("s`" "Ã‚Â§" nil 0)
    ("y`" "\245" nil 0)   ;; Ã‚Â¥
    ("?`" "Ã‚Â¿" nil 0)
    ("!`" "Ã‚Â¡" nil 0)
    (":`" "ÃƒÂ·" nil 0)
    ("<`" "Ã‚Â«" nil 0)
    (">`" "\273" nil 0)
    ("/`" "\370" nil 0)   ;; ÃƒÂ¸
    ("!`" "\241" nil 0)   ;; Ã‚Â¡
    ("b`" "\337" nil 0)   ;; ÃƒÂŸ
    ("c`" "\242" nil 0)   ;; Ã‚Â¢
    ))

Can someone point me to an explanation of why this happened, and how to fix it?

Thank you very much,

Alan

--
Alan Davis :    lng...@gmail.com

"It's never a matter of liking or disliking ..."
---Santa Ynez Chumash Medicine Man

"We have no art. We do everything as well as we can." ---Balinese saying

Peter Dyballa

unread,

Jun 7, 2008, 2:34:28 PM6/7/08

to Alan E. Davis, help-gn...@gnu.org

Am 07.06.2008 um 15:32 schrieb Alan E. Davis:

> As an example, I have an abbreviation table that has been in my
> init file
> for 15 years, and now reads as gibberish.

Check which encoding is used in a backup of your init file. Then open
it with a prefix command: C-x RET c <the encoding> RET and then C-x C-
f or e in dired-mode to open the init file. Now put into its first line:

;; -*- mode: Emacs-Lisp; coding: utf-8; -*-

and with a prefix command: C-x RET c utf-8 RET C-x C-s. This will
make sure that the file is read into GNU Emacs in its original
encoding and is saved as UTF-8 *and* will ever be re-opened in UTF-8
encoding.

The kind your abbreviation table looks like, makes it obvious that
its UTF-8 file contents was opened in some 8-bit mode and again saved
as UTF-8. 8-bit characters ("extended" US-ASCII) are in UTF-8 encoded
as two 8-bit "characters." When you re-read them in 8-bit mode (1
char = 1 char, while in UTF-8 it would be 2 char = 1 char) and save
them again as UTF-8 they become four 8-bit "characters."

--
Greetings

Pete

We have to expect it, otherwise we would be surprised.

Nikolaj Schumacher

unread,

Jun 7, 2008, 2:38:54 PM6/7/08

to Alan E. Davis, help-gn...@gnu.org

"Alan E. Davis" <lng...@gmail.com> wrote:

> I'be been poking around, but don't have a clue where to start.
> I never did understand encoding

Well, that's a clue where to start, I suppose. :)

> ("a`" "ÃƒÂ¡" nil 1)

Looks like the file has been written using UTF-8 and read using
Latin-1. Try adding:

(set-language-environment "UTF-8")

However, I believe this shouldn't be necessary if the system is
configured correctly. What OS do you use?

regards,
Nikolaj Schumacher

Giorgos Keramidas

unread,

Jun 7, 2008, 10:04:30 AM6/7/08

to

On Sat, 7 Jun 2008 23:32:32 +1000, "Alan E. Davis" <lng...@gmail.com> wrote:
> I have stumbled on a problem in my ~/.emacs.el . I am at a loss to even
> explain it. It seems to me like it might be related to character encoding.
>
>
> For a number of months, or even years, I have been encountering messages
> asking what encoding I wish to use to save files. I have gotten into the
> habit of saying "utf-8" because it gets me out of there, but I haven't a
> clue what this might do or mean.
>
> Now, my .emacs has gotten buggered. I pasted some snippets from a web
> site. Later one, those snippets, it seems, have gotten expressed as garbage
> characters. I'be been poking around, but don't have a clue where to start.
> I never did understand encoding---I mean, what it does, what parts of the
> sysem are involved. I cannot even get to first base debugging this.
>
> As an example, I have an abbreviation table that has been in my init file
> for 15 years, and now reads as gibberish.
>
> (define-abbrev-table 'global-abbrev-table '(
> ("%`" "‚\°" nil 0)
> ("a`" "ÃƒÂ¡" nil 1)

[...]

> ("c`" "\242" nil 0) ;; Ã‚Â¢
> ))
>
> Can someone point me to an explanation of why this happened, and how to fix
> it?

Which Emacs version are you using?

Alan E. Davis

unread,

Jun 7, 2008, 6:36:16 PM6/7/08

to Nikolaj Schumacher, Peter Dyballa, help-gn...@gnu.org

Thank you very much, Nikolaj and Peter:

As of now, I see:
GDM_LANG=en_US.UTF-8
LANG=en_US.UTF-8

Emacs can read .emacs.el now, I have incrementally deleted or commented out parts that emacs tripped over when reading .emacs . I also placed the header into the file that peter suggested.

;; -*- mode: Emacs-Lisp; coding: utf-8; -*-

Now the question comes up: what is the most appropriate coding system to be using? I guess that is a matter for another post. Or is it? Should I be using a two-byte language encoding at all? All characters I ordinarily use are availalbe in a latin-1 encoding. The original reason I turned to a text editor is the need for raw text input and output for language data (I was working on a lexicon of Chuukese at the time).

I also was put off by the Windows encodings that emacs recommended when a file was being saved. I avoid all proprietary file formats, so felt a bit taken aback by that suggestion.

I have started to read more about this problem/issue. I have gotten away with a sloppy .emacs.el file for a while, and sloppy conventions on encoding. I need to look into it now. However, it is not something I would choose to spend a great deal of time on: it's only of secondary importance to my work.

It is suspicious that the problem only recently happened. The first instance I discovered was a cut and paste from firefox of snippets for a .emacs. I've been using this same .emacs.el (with many changes and additions) for 15 years, all on GNU/Linux systems. I have had to adapt, but nothing this radical.

Thank you again,

Alan

On Sun, Jun 8, 2008 at 4:38 AM, Nikolaj Schumacher <n_schu...@web.de> wrote:

"Alan E. Davis" <lng...@gmail.com> wrote:

> I'be been poking around, but don't have a clue where to start.

> I never did understand encoding

Well, that's a clue where to start, I suppose. :)

> ("a`" "ÃƒÂ¡" nil 1)

Looks like the file has been written using UTF-8 and read using
Latin-1. Try adding:

(set-language-environment "UTF-8")

However, I believe this shouldn't be necessary if the system is
configured correctly. What OS do you use?

regards,
Nikolaj Schumacher

--
Alan Davis, Kagman High School, Saipan lng...@gmail.com

Alan

unread,

Jun 7, 2008, 7:09:14 PM6/7/08

to

On Jun 8, 12:04 am, Giorgos Keramidas <keram...@ceid.upatras.gr>
wrote:

> Which Emacs version are you using?

emacs-snaptshot on Ubuntu GNU/Linux 8.04. I seldom use emacs 21 or
emacs 22. Not at all over the past few months, but they are installed
because of dependencies.

Thank you.

Alan

Alan E. Davis

unread,

Jun 7, 2008, 8:35:23 PM6/7/08

to Nikolaj Schumacher, Peter Dyballa, help-gn...@gnu.org

PARTLY SOLVED.

Thank you to people who responded. I have gotten part way through this. Temporarily, I am able to boot. I also have backups of my .emacs.el, so can check them out too.

Thank you again,

Alan