Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
emacs lisp: asciify unicode string
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  21 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Xah Lee  
View profile  
 More options Mar 7 2011, 4:58 pm
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Mon, 7 Mar 2011 13:58:14 -0800 (PST)
Local: Mon, Mar 7 2011 4:58 pm
Subject: emacs lisp: asciify unicode string
little elisp tutorial. comment welcome.

〈Emacs: Zap Gremlins (UNICODE chars ⇒ ASCII)〉
http://xahlee.org/emacs/emacs_zap_gremlins.html

────────────────────
Emacs: Zap Gremlins (UNICODE chars ⇒ ASCII)

Xah Lee, 2011-03-07

This page shows a little function that changes unicode string into
ASCII. For example “passé” becomes “passe”, “voilà” becomes “voila”.

When refactoring my elisp code last week, i split out this little
function. It turns unicode chars into roughly equivalent ASCII ones. I
needed this because the open source dictionary will choke on words
with unicode chars. (See: Emacs Dictionary Lookup ◇ Problems of Open
Source Dictionaries.)

I remember, in the popular Mac editor BBEdit i used 10 years ago
before emacs, there's such a command in the menu called “zap
gremlins”. Though, i'm not aware there's one in emacs, but might be.
Anyway, here's the code:

(defun asciify-string (inputstr)
  "Make unicode string into equivalent ASCII ones.
Todo: this command is not exhaustive."
  (let ()
   (setq inputstr (replace-regexp-in-string "á\\|à\\|â\\|ä" "a"
inputstr))
   (setq inputstr (replace-regexp-in-string "é\\|è\\|ê\\|ë" "e"
inputstr))
   (setq inputstr (replace-regexp-in-string "í\\|ì\\|î\\|ï" "i"
inputstr))
   (setq inputstr (replace-regexp-in-string "ó\\|ò\\|ô\\|ö" "o"
inputstr))
   (setq inputstr (replace-regexp-in-string "ú\\|ù\\|û\\|ü" "u"
inputstr))
    inputstr
    ))

You might improve this code, as right now it's puny. Right now it's a
function that takes in a string. You might also create a version that
works on region, or better yet, works on text selection if there's
one, else on current word (or line, or paragraph, or buffer, your
design call). (For how, see: Emacs Lisp: Using thing-at-point.)

Here's common non-english letters: ÀÁÂÃÄÅÆ Ç ÈÉÊË ÌÍÎÏ ÐÑ ÒÓÔÕÖ
ØÙÚÛÜÝÞß àáâãäåæç èéêë ìíîï ðñòóôõö øùúûüýþÿ. You also might consider
changing unicode bullet “•” to “*”, and others such as “→” to “->”,
“≥” to “>=”, etc.

Or, perhaps you know someone has written this somewhere?

────────────────────
Accumulator vs Parallel Programing

When looking at my code, another thing that piqued my interest is
that, notice how the algorithm is of sequential nature? The paradigm
is similar to what's called “accumulator” or “iteration”. Recently, i
watched Guy Steele's talk on parallel programing (See: Guy Steele on
Parallel Programing.) and learned that the iteration style is very
difficult for compiler to automatically generate parallel code.

A better way to write it for parallel programing, is to “map” a char-
transform function to the string. (in elisp, a string datatype is also
a sequence, and can be the second argument to “mapcar”.) It will
probably become slower, but it'll be good in n years when someday
emacs lisp becomes Scheme Lisp or something.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Julian Bradfield  
View profile  
 More options Mar 8 2011, 3:22 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Julian Bradfield <j...@inf.ed.ac.uk>
Date: Tue, 8 Mar 2011 08:22:24 +0000 (UTC)
Local: Tues, Mar 8 2011 3:22 am
Subject: Re: emacs lisp: asciify unicode string
On 2011-03-07, Xah Lee <xah...@gmail.com> wrote:
> (defun asciify-string (inputstr)
>   "Make unicode string into equivalent ASCII ones.
> Todo: this command is not exhaustive."
>   (let ()
>    (setq inputstr (replace-regexp-in-string "á\\|à\\|â\\|ä" "a"

 etc.

Yep, it sure is not exhaustive.

The right way to do this is to fix the "open source dictionary",
whatever that is, as anything that doesn't handle Unicode is pretty
much useless in today's world.
Failing that, then the right way to do asciify-string is to use the
Unicode Character Database: convert to NFKD, and then remove all the
combining characters, and all the remaining non-ASCII characters (how
are you going to ascii-fy Russian, Greek, Chinese?).
This is (probably) a couple of lines of perl, but it's not built into Emacs.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Mar 8 2011, 3:59 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Tue, 8 Mar 2011 00:59:56 -0800 (PST)
Local: Tues, Mar 8 2011 3:59 am
Subject: Re: emacs lisp: asciify unicode string
On Mar 8, 12:22 am, Julian Bradfield <j...@inf.ed.ac.uk> wrote:

that seems a big job.

> This is (probably) a couple of lines of perl, but it's not built into Emacs.

hum... not sure there's a lib for it but i haven't done perl for
years. Is it really a few lines of perl? but maybe hours to write it?
i suppose so if one specializes in unicode processing with perl.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Teemu Likonen  
View profile  
 More options Mar 8 2011, 5:13 am
Newsgroups: comp.emacs
From: Teemu Likonen <tliko...@iki.fi>
Date: Tue, 08 Mar 2011 12:13:51 +0200
Local: Tues, Mar 8 2011 5:13 am
Subject: Re: emacs lisp: asciify unicode string
* 2011-03-08 08:22 (UTC), Julian Bradfield wrote:

> Failing that, then the right way to do asciify-string is to use the
> Unicode Character Database: convert to NFKD, and then remove all the
> combining characters, and all the remaining non-ASCII characters (how
> are you going to ascii-fy Russian, Greek, Chinese?). This is
> (probably) a couple of lines of perl, but it's not built into Emacs.

Also, "iconv" utility can be used for that:

    (defun string-to-ascii (string)
      (with-temp-buffer
        (insert string)
        (call-process-region (point-min) (point-max) "iconv" t t nil
                             "--to-code=ASCII//TRANSLIT")
        (buffer-substring-no-properties (point-min) (point-max))))


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Julian Bradfield  
View profile  
 More options Mar 8 2011, 7:31 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Julian Bradfield <j...@inf.ed.ac.uk>
Date: Tue, 8 Mar 2011 12:31:53 +0000 (UTC)
Local: Tues, Mar 8 2011 7:31 am
Subject: Re: emacs lisp: asciify unicode string
On 2011-03-08, Xah Lee <xah...@gmail.com> wrote:

> On Mar 8, 12:22 am, Julian Bradfield <j...@inf.ed.ac.uk> wrote:
>> This is (probably) a couple of lines of perl, but it's not built into Emacs.

> hum... not sure there's a lib for it but i haven't done perl for
> years. Is it really a few lines of perl? but maybe hours to write it?
> i suppose so if one specializes in unicode processing with perl.

Oh, all right. I've done it. Here's the filter that just removes the
accents from stdin:
perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

This is the first time I've used Perl's Unicode facilities, and it
took me 5 minutes to look up the stuff to write that. (But I do know
Unicode, which helps.)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Mar 8 2011, 7:38 pm
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Tue, 8 Mar 2011 16:38:18 -0800 (PST)
Local: Tues, Mar 8 2011 7:38 pm
Subject: Re: emacs lisp: asciify unicode string
On Mar 8, 4:31 am, Julian Bradfield <j...@inf.ed.ac.uk> wrote:

nice. ☺

had to have that smiley though. ☺

humm NFKD. didn't know about that. just read up.
http://en.wikipedia.org/wiki/Unicode_equivalence

too lazy to lookup, but what's the 「\pM」 there?

ps i put your code on my page.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Mar 8 2011, 7:40 pm
Newsgroups: comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Tue, 8 Mar 2011 16:40:31 -0800 (PST)
Local: Tues, Mar 8 2011 7:40 pm
Subject: Re: emacs lisp: asciify unicode string
On Mar 8, 2:13 am, Teemu Likonen <tliko...@iki.fi> wrote:

nice. Though, i can't use that cause i don't have iconv installed on
my cygwin (Windows). It wasn't in OS X 10.4.x but not sure what about
today.

PS added your code on my page if you don't mind.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile   Translate to Translated (View Original)
 More options Mar 8 2011, 7:52 pm
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Tue, 8 Mar 2011 16:52:02 -0800 (PST)
Local: Tues, Mar 8 2011 7:52 pm
Subject: Re: emacs lisp: asciify unicode string
On Mar 7, 1:58 pm, Xah Lee <xah...@gmail.com> wrote:

thought a bit more about this yesterday. I think this is actually a
great example of parallel programing Guy Steele is talking about.

if we do it by mapping a transcode function to each char in the
string, then it'll
probably becomes 100 times slower as it is now. However, suppose the
string is
few millions char long (which is just few megabytes), then using map
will
certainly be much faster, provided that elisp compiler/interpreter in
the future
have become parallelism aware...

-------------------

... about calling external util... it has problem of IO limitations,
especially on Windows. e.g. you have to make sure the encoding sent is
specified in the external util input spec, then make sure the size is
within limit of IO allowed... (i don't know the details, but often
have problems on Windows when calling util in cygwin. e.g. my rgrep is
even broken. Ι get the error message “find: unknown predicate `-nam'”.
Apparantly the long input sent to shell has been truncated.)

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Julian Bradfield  
View profile  
 More options Mar 9 2011, 4:38 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Julian Bradfield <j...@inf.ed.ac.uk>
Date: Wed, 9 Mar 2011 09:38:08 +0000 (UTC)
Local: Wed, Mar 9 2011 4:38 am
Subject: Re: emacs lisp: asciify unicode string
On 2011-03-09, Xah Lee <xah...@gmail.com> wrote:

> too lazy to lookup, but what's the 「\pM」 there?

\pM means characters with the "mark" property. This includes accents,
as well as some other additions to characters. (So that script will
also remove the vowels from devanagari, which may or may not be
intended. There are ways to restrict it to Latin characters: if we
only want to remove accents from latin characters, then:
 s/(\p{Latin})\pM+/$1/g;

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Teemu Likonen  
View profile  
 More options Mar 11 2011, 5:44 am
Newsgroups: comp.emacs
From: Teemu Likonen <tliko...@iki.fi>
Date: Fri, 11 Mar 2011 12:44:37 +0200
Local: Fri, Mar 11 2011 5:44 am
Subject: Re: emacs lisp: asciify unicode string
* 2011-03-08 16:40 (-0800), Xah Lee wrote:

> On Mar 8, 2:13 am, Teemu Likonen <tliko...@iki.fi> wrote:
>> Also, "iconv" utility can be used for that:

>>     (defun string-to-ascii (string)
>>       (with-temp-buffer
>>         (insert string)
>>         (call-process-region (point-min) (point-max) "iconv" t t nil
>>                              "--to-code=ASCII//TRANSLIT")
>>         (buffer-substring-no-properties (point-min) (point-max))))

> nice. Though, i can't use that cause i don't have iconv installed on
> my cygwin (Windows). It wasn't in OS X 10.4.x but not sure what about
> today.

> PS added your code on my page if you don't mind.

I don't mind having my code there but it annoys me a bit that you
changed the symbol named "string" to "inputStr" while still saying it's
my code. CamelCaps is not good Lisp style and I don't stand behind that
change. Also, having the word "input" in such function's argument is
redundant. Obviously it's about some kind of input because it's in the
function's lambda list (arguments).

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "naming of variables" by Xah Lee
Xah Lee  
View profile  
 More options Mar 11 2011, 6:48 am
Newsgroups: comp.emacs, comp.lang.lisp
From: Xah Lee <xah...@gmail.com>
Date: Fri, 11 Mar 2011 03:48:32 -0800 (PST)
Local: Fri, Mar 11 2011 6:48 am
Subject: naming of variables
2011-03-11

* 2011-03-08 16:40 (-0800), Xah Lee wrote:

> On Mar 8, 2:13 am, Teemu Likonen <tliko...@iki.fi> wrote:
>> Also, "iconv" utility can be used for that:
>>     (defun string-to-ascii (string)
>>       (with-temp-buffer
>>         (insert string)
>>         (call-process-region (point-min) (point-max) "iconv" t t nil
>>                              "--to-code=ASCII//TRANSLIT")
>>         (buffer-substring-no-properties (point-min) (point-max))))
> nice. Though, i can't use that cause i don't have iconv installed on
> my cygwin (Windows). It wasn't in OS X 10.4.x but not sure what about
> today.
> PS added your code on my page if you don't mind.

On Mar 11, 2:44 am, Teemu Likonen <tliko...@iki.fi> wrote:

> I don't mind having my code there but it annoys me a bit that you
> changed the symbol named "string" to "inputStr" while still saying it's
> my code.

I wrote “Code originally by Teemu Likonen.”. I added the word
“originally” there precisely about this worry. ☺

i changed the “inputStr” to “string” now.

> CamelCaps is not good Lisp style

yes. I'm aware. But that's really a point of view.

> and I don't stand behind that
> change.
> Also, having the word "input" in such function's argument is
> redundant. Obviously it's about some kind of input because it's in the
> function's lambda list (arguments).

i don't think it's a big issue, but here's the reason why i'm using
camelCase.

(1) it provides a easy way to distinguish variables from built-in
symbols. Particularly because of the fact that emacs-lisp-mode's
coloring scheme is not full. (i wrote about this problem here
〈Emacs Lisp Mode Syntax Coloring Problem〉
http://xahlee.org/emacs/modernization_elisp_syntax_color.html )

i developed a habit to use cameCase in particular for local variables.
I could've used under_score but that's more typing and less visually
distinguishable to lisp's hypen-word-style.

(2) For variables, in recent years i developed a habit to avoid any
standard english word, partly as a experiment. So, i'd name “file” as
“myFile” or “aFile”. “string” would be “str”, “myString”, etc. A good
solution in this regard is to append or prepend a random number in var
names. So, “string” would be “string-5w77o” or something like that.
But the problem with this is that it's too long and disruptive in
reading and typing. Recently i've been toying with the idea of
attaching a unicode to all vars. e.g. all my var would start with ξ.
So “string” would be “ξstring”. (works fin e in elisp btw). This way
solves the random string readability issue.

The reason for this “avoiding english words” is for easy source code
transformation. The idea is similar to the idea of referential
transparency.

imagine, if every local variable (or every symbol) are a unique
identifier in the source code. This way, you could locate any variable
in the whole source code, and you can freely change their names.

(3) another reason that somewhat pushed me in this naming exaperiment
is that... instead of naming your vars in some meaningful english
words, the opposite is to name them completely random, as in math's x,
y, z.

So, i'd name “counter” as just “i” or “n”. (since these are 1-letter
string and too common, so with the unique naming idea above, i usually
name them “ii” or “nn” or might be “ξi”)

the idea with abstract naming is that it forces you to understand the
code as a math expression that specify algorithm, instead of like
english prose. Readability of source code is helped by coding in a
pure functional programing style (e.g. functions, input, output), and
good documentation of each function. So, to understand a function, you
should just read the doc about its input output. While inside a code
snippet, it is understood by simple functional style programing
constructs.

to illustrate from the opposite view, the problem with english naming
is that often it interfere with what the code is actually doing. For
example, in normal convention often you'll see names like
“thisObject”, “thatTree”, “fileList”, or “files”, your focus is on the
meaning of these words, but not what the data type actually are or the
function's actual mathematical behavior. The words can be deceptive.
e.g. “file” can be a file handle, file path, file content. This is
especially a problem  when you are reading source code of a lang you
do not know. e.g. when you encounter the word “object”, you don't know
if that's a keyword in the language, a pattern spec, something, or
just a variable name. When you read a normal source code, half of the
words are like that unless the editor does syntax coloring that
distinguish the lang's keyword.

to view this idea in another way ... when you read math, you never see
mathematician name their variables with a multi-letter descriptive
word, but usually a single symbol, yet there's no problem
understanding the expression. Your focus and understanding is on the
abstract process and structure.

again, the above ideas is just a experiment. Without actually doing
it, one never know what's really good or bad.

 Xah ∑ http://xahlee.org/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "emacs lisp: asciify unicode string" by Xah Lee
Xah Lee  
View profile  
 More options Mar 11 2011, 5:07 pm
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Fri, 11 Mar 2011 14:07:59 -0800 (PST)
Local: Fri, Mar 11 2011 5:07 pm
Subject: Re: emacs lisp: asciify unicode string
Here's a more polished solution. The “asciify-word-or-selection” is a
command. It works on current word or text selection.

(defun asciify-string (inputstr)
"Make Unicode string into equivalent ASCII ones.
For example, “passé” becomes “passe”.
This function works on chars in European languages, and does
not transcode arbitrary unicode chars (such as Greek).
Un-transformed unicode char remains in the string."
  (let ()
   (setq inputstr (replace-regexp-in-string "á\\|à\\|â\\|ä\\|ã\\|å"
"a" inputstr))
   (setq inputstr (replace-regexp-in-string "é\\|è\\|ê\\|ë" "e"
inputstr))
   (setq inputstr (replace-regexp-in-string "í\\|ì\\|î\\|ï" "i"
inputstr))
   (setq inputstr (replace-regexp-in-string "ó\\|ò\\|ô\\|ö\\|õ\\|ø"
"o" inputstr))
   (setq inputstr (replace-regexp-in-string "ú\\|ù\\|û\\|ü" "u"
inputstr))
   (setq inputstr (replace-regexp-in-string "ñ" "n" inputstr))
   (setq inputstr (replace-regexp-in-string "ç" "c" inputstr))
   (setq inputstr (replace-regexp-in-string "ð" "d" inputstr))
   (setq inputstr (replace-regexp-in-string "þ" "th" inputstr))
   (setq inputstr (replace-regexp-in-string "ß" "ss" inputstr))
   (setq inputstr (replace-regexp-in-string "æ" "ae" inputstr))
    inputstr
    ))

(defun asciify-word-or-selection ()
  "Make Unicode string into equivalent ASCII ones.
For example, “passé” becomes “passe”.
This command works on chars in European languages, and does
not transcode arbitrary unicode chars (such as Greek).
They remain in the string.
This command calls `asciify-string' to do the string transformation."
  (interactive)
  (let (bds p1 p2 inputstr)
    (setq bds (get-selection-or-unit 'word))
    (setq inputstr (elt bds 0) p1 (elt bds 1) p2 (elt bds 2)  )
    (setq inputstr (asciify-string inputstr))
    (delete-region p1 p2 )
    (insert inputstr)
    ))

The command uses “get-selection-or-unit”. You can get the code for
that at Emacs Lisp: Using thing-at-point.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "naming of variables" by Tim X
Tim X  
View profile  
 More options Mar 11 2011, 7:12 pm
Newsgroups: comp.emacs, comp.lang.lisp
From: Tim X <t...@nospam.dev.null>
Date: Sat, 12 Mar 2011 11:12:58 +1100
Local: Fri, Mar 11 2011 7:12 pm
Subject: Re: naming of variables

It is good to experiment - its how we learn. However, I think it is
unlikely that the experimentation and experiences of a single person are
going to be of any real benefit to anyone other than the individual
concerned. The first stage in any experimentation should be to scan the
literature and become familiar with the current body of knowledge in the
area. Failing to do so means the experiments are unlikely to be of much
real interest to others. Writing good clear code is a real skill that
takes years to develop and refine. Reading and learning from others code
is extremely beneficial in helping to develop good technique. It is rare
that an author's first book or an artists first painting represents
their best work. It is even rarer for an author or artist to create a
masterpice the first time without also having studied the works of
others and having a solid grasp of both the theoretical and practicle
aspects of the discipline. Practice/experience and appreciation of
others work is essential. In many ways, this is an example of where you
really need to know and understand the existing conventions/rules before
you can break/change them.

There are lots of articles and books concerning this topic and a lot of
research has been done in this area. There are many 'formal' techniques
that have been developed with varying levels of success. You could
gain some valuable insight by lookin at some of the research done in the
area of software engineering rather than working from 'first
principals' and read about techniques used in large software projects -
for example, while I don't like MS conventions in this area, reading
about them provides some good insight into the problems they are
attempting to solve and how they feel their solution achieves this.

I would suggest many of your ideas have a consistent weakness in that
they overlook one of the main objectives of code - to communicate your
ideas, algorithm etc to others as well as to yourself. One way to
evaluate your technique is to regularly re-visit code you wrote some
time ago and see how easily you can understand it without using commens
an documentation and seek feedback from others regarding how easily they
can understand what you have written.

I also think technique and style cannot be divorced from the language
being used. The way I write depends very much on the dialect being
used. For example, in perl,I have used variable nameing techniques where
the name indicates whether the variable is a scalar, hash, reference
etc, in C I may use names that indicate whether the variable is a
pointer or not and in Java - well, actually I just avoid java because o
the rediculously verbose nature of its naming conventions which just end
up being bloated boilerplate noise.

In lisp, I do tend to avoid using variables names that are the same as
important keywords. For example, last week I was debugging som code
written by someone else that had code along the lines of

(Let ((menu-alist
        '((....
           ....)
      cons (vec (vector 'rootmenu 'vm nil)))
   ....
   ....)

which I found an unfortunate bit of code because the name and position
of the 'cons' variable meant I had to stop an re-read this declaration
to recognise that in this context, 'cons' is a variable, not a function.
Worse still, at this point, I have no idea what either 'cons' or 'vec'
are supposed to represent. All I know is that 'cons' is probably a cons
cell (a guess at this point) and vec is a vector (obvious from its
definition). I don't know what the cons cell represents or what the vec
is used for. Matters were made worse by the unfortunate layout/indenting
used. Either putting 'cons' as the last variable or putting the
definition on its own line may have helped make it easier to read
(putting it as the first variable could also have made it even harder!)

The point is, the above code is readable, but it requires additional
mental effort that could easily have been avoided by using better layout,
meaningful variable names and avoiding names that are the same as a
frequently used function name.

I don't use camel case in lisp because it is a case insensitive
language. I want my function names and variables to have the same level
of distinctiveness in the repl, debugger and backtraces as they do in
source code. I do use the ear-muff convention for special variables to
remind me they are special. I tend to avoid using variable names that
are the same as keyword/function names, unless not doing so would make
the code harder to understand (i.e. I've been known to use the ...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "emacs lisp: asciify unicode string" by Teemu Likonen
Teemu Likonen  
View profile  
 More options Mar 12 2011, 1:51 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Teemu Likonen <tliko...@iki.fi>
Date: Sat, 12 Mar 2011 08:51:36 +0200
Local: Sat, Mar 12 2011 1:51 am
Subject: Re: emacs lisp: asciify unicode string
* 2011-03-11 14:07 (-0800), Xah Lee wrote:

That's ugly. Here's even more polished version:

(defmacro with-replace-regexp-series (string &rest clauses)
  (declare (indent 1))
  (let ((value (make-symbol "--value--")))
    `(let ((,value ,string))
       ,@(let (forms)
           (dolist (clause clauses (nreverse forms))
             (push `(setq ,value (replace-regexp-in-string
                                  ,(car clause) ,(cadr clause) ,value))
                   forms))))))

(defun asciify-string (string)
  (with-replace-regexp-series string
    ("á\\|à\\|â\\|ä\\|ã\\|å" "a")
    ("é\\|è\\|ê\\|ë" "e")
    ("í\\|ì\\|î\\|ï" "i")
    ("ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o")
    ("ú\\|ù\\|û\\|ü" "u")
    ("ñ" "n")
    ("ç" "c")
    ("ð" "d")
    ("þ" "th")
    ("ß" "ss")
    ("æ" "ae")))


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Teemu Likonen  
View profile  
 More options Mar 12 2011, 2:02 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Teemu Likonen <tliko...@iki.fi>
Date: Sat, 12 Mar 2011 09:02:53 +0200
Local: Sat, Mar 12 2011 2:02 am
Subject: Re: emacs lisp: asciify unicode string
* 2011-03-11 14:07 (-0800), Xah Lee wrote:

Here's even more polished version:

(defmacro replace-regexp-series (string &rest clauses)
  (declare (indent 1))
  (let ((value (make-symbol "--value--")))
    `(let ((,value ,string))
       ,@(let (forms)
           (dolist (clause clauses (nreverse forms))
             (push `(setq ,value (replace-regexp-in-string
                                  ,(car clause) ,(cadr clause) ,value))
                   forms))))))

(defun asciify-string (string)
  (replace-regexp-series string
    ("[áàâäãå]" "a")
    ("[éèêë]" "e")
    ("[íìîï]" "i")
    ("[óòôöõø]" "o")
    ("[úùûü]" "u")
    ("ñ" "n")
    ("ç" "c")
    ("ð" "d")
    ("þ" "th")
    ("ß" "ss")
    ("æ" "ae")))


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "naming of variables" by Xah Lee
Xah Lee  
View profile  
 More options Mar 12 2011, 8:47 pm
Newsgroups: comp.emacs, comp.lang.lisp
From: Xah Lee <xah...@gmail.com>
Date: Sat, 12 Mar 2011 17:47:35 -0800 (PST)
Local: Sat, Mar 12 2011 8:47 pm
Subject: Re: naming of variables

2011-03-12

Xah Lee wrote:

│ 〈Programing Style: Variable Naming: English Words Considered
Harmful〉
http://xahlee.org/comp/programing_variable_naming.html

(edited and expanded above for clarity)

On Mar 11, 4:12 pm, Tim X wrote:
│ It is good to experiment - its how we learn. However, I think it is
│ unlikely that the experimentation and experiences of a single person
are
│ going to be of any real benefit to anyone other than the individual
│ concerned.

yes. But for every undertaking there's a first step. ☺

│ The first stage in any experimentation should be to scan the
│ literature and become familiar with the current body of knowledge in
the
│ area. Failing to do so means the experiments are unlikely to be of
much
│ real interest to others.

That's the thought pattern of many hard core tech geekers, and is my
approach too in much of 1990s and early 2000s.

Bertran Russel has written about this. That when he was young, he
thought that to study anything he would first do a complete survey of
existing knowledge, then venture on discovery, however, he found out
that is actually not effective. Rather, if you just start exploration
and push out your findings, you'll have more impact, and perhaps with
overall more understanding.

He wrote something to that effect, i forgot in what essay or lecture
or publication, and it's hard to web search. (might be parts of his
collected auto-bio) Anyone knows?

if you look at programing languages or computer industry... you'll
find this to be true (albeit not in some scientific way). Namely, lang
such as perl, python, ruby, php, C-turd, or whatnot that crops up and
became popular, are not the result of the designer having deep
understanding of programing langs or massive survey of the varieties
out there. Rather, they just pushed themselves forward, then later on
concocted “philosophies” to back it up. While, lang from designers who
really studied thoroughly or tried to about langs before they
published their invention, usually are not successful ones. Typically
the academecians. (of course their lack of salemanship probably have
much to do with it too.)

thinking about this, i think actually most lang inventors did not know
the varieties of prog langs when they invented their own. This makes
sense too. If everyone takes the tech geeker stance of studying
existing knowledge before doing, the world wouldn't move.

I know you Tim X. I remember you in the beginning maybe 4 or 5 years
ago as someone who think i'm a dork on my emacs opinions. I think in
recent years my image improved, perhaps slightly, and i appreciate it.

Will probably write a separate post about some thoughts on programing
style.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "emacs lisp: asciify unicode string" by Xah Lee
Xah Lee  
View profile  
 More options Mar 12 2011, 8:54 pm
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Sat, 12 Mar 2011 17:54:32 -0800 (PST)
Local: Sat, Mar 12 2011 8:54 pm
Subject: Re: emacs lisp: asciify unicode string
On Mar 11, 11:02 pm, Teemu Likonen <tliko...@iki.fi> wrote:

egads, macros. Unreadable, or, at least i don't think i'll ever
understand it. ☺

though, i think this presents a common problem in elisp about find/
replace replacing multiple pairs.

i wrote a package on this.

〈Emacs Lisp Package: Multi-Pair String Replacement:
xfrp_find_replace_pairs.el〉
http://xahlee.org/emacs/elisp_replace_string_region.html

what do you think?

one problem i thought is interetsing is about a feedback loop. That
is, if your code does the find/replace recursively, then some string
in the input text may be unexpectedly replaced, even if that string
isn't anywhere in the find string.

e.g.

For example, if the input string is “abcd”, and the pairs are “a → c”
and “c → d”, then, result is “dbdd”, though most of the time you want
“cbdd”. This is especially important if you use regex in your find
string.

how does your code handle this?

the solution i did is to do a intermediate replacement. That is, take
find string, replace it with some random string that's not likely to
occure in text, then replace this random string to the replacement
string.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Teemu Likonen  
View profile  
 More options Mar 13 2011, 4:07 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Teemu Likonen <tliko...@iki.fi>
Date: Sun, 13 Mar 2011 10:07:16 +0200
Local: Sun, Mar 13 2011 4:07 am
Subject: Re: emacs lisp: asciify unicode string
* 2011-03-12 17:54 (-0800), Xah Lee wrote:

> On Mar 11, 11:02 pm, Teemu Likonen <tliko...@iki.fi> wrote:
>> (defmacro replace-regexp-series (string &rest clauses)
>>   (declare (indent 1))
>>   (let ((value (make-symbol "--value--")))
>>     `(let ((,value ,string))
>>        ,@(let (forms)
>>            (dolist (clause clauses (nreverse forms))
>>              (push `(setq ,value (replace-regexp-in-string
>>                                   ,(car clause) ,(cadr clause) ,value))
>>                    forms))))))
> egads, macros. Unreadable, or, at least i don't think i'll ever
> understand it. ☺

The point behind my version was to hide boring repetitive code away. I
made a syntactic abstraction (a macro) but of course there are other
ways too. I mean, why did you write a lot of expressions like

    (setq inputstr (replace-regexp-in-string "ñ" "n" inputstr))

when you could have written a loop, for example?

> one problem i thought is interetsing is about a feedback loop. That
> is, if your code does the find/replace recursively, then some string
> in the input text may be unexpectedly replaced, even if that string
> isn't anywhere in the find string.

> For example, if the input string is “abcd”, and the pairs are “a → c”
> and “c → d”, then, result is “dbdd”, though most of the time you want
> “cbdd”. This is especially important if you use regex in your find
> string.

> how does your code handle this?

The macro was just a syntactic abstraction of your code. That is, it
expands to series of expressions which are equal to yours. No change in
the functionality.

    (macroexpand
     '(replace-regexp-series "öljyä"
        ("ä" "a")
        ("ö" "o")))

will return

    (let ((--value-- "öljyä"))
      (setq --value-- (replace-regexp-in-string "ä" "a" --value--))
      (setq --value-- (replace-regexp-in-string "ö" "o" --value--)))

That's the code that will be evaluated at runtime. Here --value-- is an
uninterned symbol so it can't be seen anywhere else.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Mar 13 2011, 6:57 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Sun, 13 Mar 2011 03:57:15 -0700 (PDT)
Local: Sun, Mar 13 2011 6:57 am
Subject: Re: emacs lisp: asciify unicode string
2011-03-13

On Mar 13, 12:07 am, Teemu Likonen <tliko...@iki.fi> wrote:

hi Teemu,

yes i understand. Still, here's few interesting points:

• emacs lisp needs this build-in as a elisp function, because multi-
pair replacement is common.

• i have written this abstraction already. Here:
http://xahlee.org/emacs/elisp_replace_string_region.html

• the way i did it, also by a loop, but using “while”, not by macro.

• there is a issue about multi-pair find/replace problem. Namely, if
you simply to each find/replace pair one after another, you may get
unexpected result. I detailed this problem in the above. I named it
feedback loop problem.

Was wondering about your thoughts on these.

also, about macros, opinions differ. Of this particular problem, I
think the
macro solution is rather ugly and hard to understand.  I think the
explicit
sequential replace string call is easier to understand.  I think the
only
advantage of macro solution is shrinking size of source code.

i started functional programing in Mathematica since ~1992. Like most
FP geeks, i've read and explored extensively the FP paradigms,
especially on toy problems of lists. Ι really love all the abstract
ways, but i realized sometimes in mid 2000s, that many of these
abstractions that functional programers love to chat about (e.g.
Schemers), are actually harmful. (on the other hand, i'm not sure non-
academic production code of functional langs actually do that much
abstractions for the sake of abstraction/elegance/purity type of
impetus)

...

on rewriting this particular piece of code, there's this way i rather
find more interesting:

(replace-regexp-in-string "á\\|à\\|â\\|ä\\|ã\\|å" "a" (replace-regexp-
in-string "é\\|è\\|ê\\|ë" "e" (replace-regexp-in-string "í\\|ì\\|î\\|
ï" "i" (replace-regexp-in-string "ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o" (replace-
regexp-in-string "ú\\|ù\\|û\\|ü" "u" (replace-regexp-in-string "ñ"
"n" (replace-regexp-in-string "ç" "c" (replace-regexp-in-string "ð"
"d" (replace-regexp-in-string "þ" "th" (replace-regexp-in-string "ß"
"ss" (replace-regexp-in-string "æ" "ae" inputstr)))))))))))

but there are 2 problems with it that's interesting to note.

• typical lispers will probably find the code abominable. This is
because, in my opinion, a problem caused by nesting syntax. It is a
program chaining paradigm (aka pipe, filter).

• the length of the code and its deep nesting, is again a problem
nesting syntax while without any automatic formatting facilities.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Teemu Likonen  
View profile  
 More options Mar 13 2011, 12:36 pm
Newsgroups: comp.lang.lisp, comp.emacs
From: Teemu Likonen <tliko...@iki.fi>
Date: Sun, 13 Mar 2011 18:36:56 +0200
Local: Sun, Mar 13 2011 12:36 pm
Subject: Re: emacs lisp: asciify unicode string
* 2011-03-12 17:54 (-0800), Xah Lee wrote:

> one problem i thought is interetsing is about a feedback loop. That
> is, if your code does the find/replace recursively, then some string
> in the input text may be unexpectedly replaced, even if that string
> isn't anywhere in the find string.
> For example, if the input string is “abcd”, and the pairs are “a → c”
> and “c → d”, then, result is “dbdd”, though most of the time you want
> “cbdd”. This is especially important if you use regex in your find
> string.
> the solution i did is to do a intermediate replacement. That is, take
> find string, replace it with some random string that's not likely to
> occure in text, then replace this random string to the replacement
> string.

The problem should be defined more accurately. So, in string “abcd” the
first regexp-replacement pair is “b” → “/” and the second is “^.+$” →
“XXX”. What should the result be? What string should the second regexp
match use? Maybe it should use all the (sub)strings which were left
untouched by the first replace? Is the result “XXX/XXX” or something
else?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Xah Lee  
View profile  
 More options Mar 14 2011, 2:38 am
Newsgroups: comp.lang.lisp, comp.emacs
From: Xah Lee <xah...@gmail.com>
Date: Sun, 13 Mar 2011 23:38:21 -0700 (PDT)
Local: Mon, Mar 14 2011 2:38 am
Subject: Re: emacs lisp: asciify unicode string
On Mar 13, 9:36 am, Teemu Likonen <tliko...@iki.fi> wrote:

good point. I haven't thought about that.

 Xah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »