Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion those funny non-ASCII characters
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Buchs, Kevin  
View profile  
 More options May 25 2012, 9:40 am
Newsgroups: gnu.emacs.help
From: "Buchs, Kevin" <buchs.ke...@mayo.edu>
Date: Fri, 25 May 2012 08:40:25 -0500
Local: Fri, May 25 2012 9:40 am
Subject: Re: those funny non-ASCII characters
Thanks, Xah and Eli, for contributing to my further understanding. I
went to a specific website where I got the content I copied and pasted
and I can see from the HTML that it has a charset=UTF-8, so I understand
that is Unicode 8-bit. Using the C-u C-x =, I see that the particular
character I pasted has a code point of 0x2013 (U+2013). I didn't see,
however, what the UTF-8 encoding of that code point was. Should I be
able to read that somewhere on the buffer of information I get with C-u
C-x = ? I was poking around the www.unicode.org website, trying to
understand how this U+2013 code point is encoded into UTF-8, but I
haven't determined that yet.

A fresh buffer in emacs for me on my Win-7 box has an encoding system of
iso-latin-1-dos. The coding system used to open and save files is the
same.

So, help me piece together what happens as I paste the UTF-8 text into a
buffer. First, the paste buffer must define that it is in UTF-8. Emacs
reads this information and inserts it into the byte string that defines
the buffer. Now, how does emacs record that it was a UTF-8 encoded
character? Does it translate it into a different internal encoding
instead of just recording the 8 bits transferred? Is this encoding used
as a superset of all possible encoding systems that emacs supports?

Now,  Xah, you suggest I embrace Unicode. What does that mean? Would it
involve marking all my lisp library files and my org-mode files with the
file variable -*- coding: utf-8 -*- ? Or is there another way to go
Unicode automatically?

I assume that if my lisp library files are encoded utf-8, then I can
paste that character from the web page into my call to replace-string in
order to substitute the longer dash of Unicode U+2013 with an ascii
hyphen or double hyphen. But, how does that really work? If the lisp
file is encoded utf-8, then how can I put an ascii character in the
replacement string?

I would appreciate it if someone could help me open this new door in my
brain a bit further.

Kevin Buchs | Senior Engineer | SPPDG | 507-538-5459 |
buchs.ke...@mayo.edu
Mayo Clinic | 200 First Street SW | Rochester, MN 55905 |
http://www.mayo.edu/sppdg

-----Original Message-----

With cursor on that character, type "C-u C-x =", and Emacs will show
everything it knows about that character, including its canonical
name.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.