Message from discussion
emacs lisp: asciify unicode string
Path: g2news1.google.com!news3.google.com!feeder.news-service.com!tudelft.nl!txtfeed1.tudelft.nl!dedekind.zen.co.uk!zen.net.uk!hamilton.zen.co.uk!163.1.2.201.MISMATCH!feeds.news.ox.ac.uk!news.ox.ac.uk!usenet.inf.ed.ac.uk!.POSTED!not-for-mail
From: Julian Bradfield <j...@inf.ed.ac.uk>
Newsgroups: comp.lang.lisp,comp.emacs
Subject: Re: emacs lisp: asciify unicode string
Date: Tue, 8 Mar 2011 08:22:24 +0000 (UTC)
Organization: School of Informatics, The University of Edinburgh
Lines: 18
Message-ID: <slrninbpq0.aj0.jcb@krk.inf.ed.ac.uk>
References: <fc493728-ed61-4b1c-91f5-f06d184f5c76@j35g2000prb.googlegroups.com>
NNTP-Posting-Host: krk.inf.ed.ac.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: automatic.inf.ed.ac.uk 1299572544 72179 129.215.25.119 (8 Mar 2011 08:22:24 GMT)
X-Complaints-To: usenet@automatic.inf.ed.ac.uk
NNTP-Posting-Date: Tue, 8 Mar 2011 08:22:24 +0000 (UTC)
User-Agent: slrn/0.9.9p1 (Linux)
On 2011-03-07, Xah Lee <xah...@gmail.com> wrote:
> (defun asciify-string (inputstr)
> "Make unicode string into equivalent ASCII ones.
> Todo: this command is not exhaustive."
> (let ()
> (setq inputstr (replace-regexp-in-string "á\\|à \\|â\\|ä" "a"
etc.
Yep, it sure is not exhaustive.
The right way to do this is to fix the "open source dictionary",
whatever that is, as anything that doesn't handle Unicode is pretty
much useless in today's world.
Failing that, then the right way to do asciify-string is to use the
Unicode Character Database: convert to NFKD, and then remove all the
combining characters, and all the remaining non-ASCII characters (how
are you going to ascii-fy Russian, Greek, Chinese?).
This is (probably) a couple of lines of perl, but it's not built into Emacs.