I want to write a function which converts the letters a-z (lower or uppercase) to the integer 0; f,g,h,j,k (note absence of i) to the integer 1; l-p to 2; q-u to 3 and v-z to 4.
Ignoring the annoyance of the missing i, a floor operation, and an upcase, this boils down to mapping the letters to consecutive numbers.
The spec tells me that partial ordering of characters is guaranteed, but points out that contiguity is not.
How should I go about writing this function in an implementation independent way ?
Jacek Generowicz <jacek.generow...@cern.ch> writes: > I want to write a function which converts the letters a-z (lower or > uppercase) to the integer 0; f,g,h,j,k (note absence of i) to the > integer 1; l-p to 2; q-u to 3 and v-z to 4.
> Ignoring the annoyance of the missing i, a floor operation, and an > upcase, this boils down to mapping the letters to consecutive numbers.
> The spec tells me that partial ordering of characters is guaranteed, > but points out that contiguity is not.
> How should I go about writing this function in an implementation > independent way ?
(defmacro in (c spec) `(in-test ,c ',(first spec) ,@(rest spec)))
(defun funky-test (c) (cond ((in c (range #\a #\e)) 0) ((in c (range #\a #\e)) 0) ((in c (range #\f #\h)) 1) ((in c (set-of #\j #\k)) 1) ((in c (range #\l #\p)) 2) ((in c (range #\q #\u)) 3) ((in c (range #\v #\z)) 4)
((in c (range #\A #\E)) 0) ((in c (range #\F #\H)) 1) ((in c (set-of #\J #\K)) 1) ((in c (range #\L #\P)) 2) ((in c (range #\Q #\U)) 3) ((in c (range #\V #\Z)) 4)
(t ; I.e. `(in c (set-of #\i #\I)) (error "Got an ~C." c)) )) =========================================================================== ===
Untested. Probably you can do it in an easier and/or compact and faster ways (note that `(not (eq 'compact 'easier))'. But this example exercises the language in several nice ways.
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
In article <tyfvgc75qm0....@pcitapi22.cern.ch>, Jacek Generowicz <jacek.generow...@cern.ch> wrote:
>I want to write a function which converts the letters a-z (lower or >uppercase) to the integer 0; f,g,h,j,k (note absence of i) to the >integer 1; l-p to 2; q-u to 3 and v-z to 4.
Did you mean to say a-e goes to 0?
>Ignoring the annoyance of the missing i, a floor operation, and an upcase, this >boils down to mapping the letters to consecutive numbers.
>The spec tells me that partial ordering of characters is guaranteed, >but points out that contiguity is not.
>How should I go about writing this function in an implementation >independent way ?
Sounds like a COND statement containing char>= and char<= predicates would do it:
(cond ((char<= #\a letter #\e) 0) ...)
Another possibility is filling in a hash table with all the mappings.
-- Barry Margolin, bar...@genuity.net Genuity, Woburn, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Marco Antoniotti <marc...@cs.nyu.edu> writes: > Jacek Generowicz <jacek.generow...@cern.ch> writes:
> > I want to write a function which converts the letters a-z (lower or > > uppercase) to the integer 0; f,g,h,j,k (note absence of i) to the > > integer 1; l-p to 2; q-u to 3 and v-z to 4.
> > Ignoring the annoyance of the missing i, a floor operation, and an > > upcase, this boils down to mapping the letters to consecutive numbers.
> > The spec tells me that partial ordering of characters is guaranteed, > > but points out that contiguity is not.
> > How should I go about writing this function in an implementation > > independent way ?
> (defun funky-test (c) > (cond ((in c (range #\a #\e)) 0) > ((in c (range #\a #\e)) 0) > ((in c (range #\f #\h)) 1) > ((in c (set-of #\j #\k)) 1) > ((in c (range #\l #\p)) 2) > ((in c (range #\q #\u)) 3) > ((in c (range #\v #\z)) 4)
> ((in c (range #\A #\E)) 0) > ((in c (range #\F #\H)) 1) > ((in c (set-of #\J #\K)) 1) > ((in c (range #\L #\P)) 2) > ((in c (range #\Q #\U)) 3) > ((in c (range #\V #\Z)) 4)
> (t ; I.e. `(in c (set-of #\i #\I)) > (error "Got an ~C." c)) > )) > =========================================================================== ==
> Untested. Probably you can do it in an easier and/or compact and faster ways > (note that `(not (eq 'compact 'easier))'. But this example > exercises the language in several nice ways.
Food for thought there, thank you.
If my aim were clarity and conciseness rather than exercising the language I guess I might develop your idea thus:
As it turns out, I also need a second set of values. Imagine the letters appear on a grid:
V W X Y Z Q R S T U L M N O P F G H J K A B C D E
and we are looking for _both_ coordinates. Getting the x coordinate in this way looks like a lot more hassle, and thus the idea of assigning consecutive values to the letters (omitting i) and using floor is appealing.
* Jacek Generowicz <jacek.generow...@cern.ch> | I want to write a function which converts the letters a-z (lower or | uppercase) to the integer 0; f,g,h,j,k (note absence of i) to the | integer 1; l-p to 2; q-u to 3 and v-z to 4. | | Ignoring the annoyance of the missing i, a floor operation, and an | upcase, this boils down to mapping the letters to consecutive numbers.
Huh?
| The spec tells me that partial ordering of characters is guaranteed, but | points out that contiguity is not.
So? Who cares?
| How should I go about writing this function in an implementation | independent way ?
Barry Margolin <bar...@genuity.net> writes: > In article <tyfvgc75qm0....@pcitapi22.cern.ch>, > Jacek Generowicz <jacek.generow...@cern.ch> wrote: > >I want to write a function which converts the letters a-z (lower or > >uppercase) to the integer 0; f,g,h,j,k (note absence of i) to the > >integer 1; l-p to 2; q-u to 3 and v-z to 4.
> Did you mean to say a-e goes to 0?
I did. Sorry.
> >Ignoring the annoyance of the missing i, a floor operation, and an > >upcase, this boils down to mapping the letters to consecutive > >numbers.
> >The spec tells me that partial ordering of characters is guaranteed, > >but points out that contiguity is not.
> >How should I go about writing this function in an implementation > >independent way ?
> Sounds like a COND statement containing char>= and char<= predicates > would do it:
> (cond ((char<= #\a letter #\e) 0) > ...)
Yes, see my reply to Marco for a better and fuller description of the problem, which shows why this has its problems.
> Another possibility is filling in a hash table with all the mappings.
Yes . . . but I was hoping to find a more interesting solution.
In article <tyfr8mv3tix....@pcitapi22.cern.ch>, Jacek Generowicz <jacek.generow...@cern.ch> wrote:
>As it turns out, I also need a second set of values. Imagine the >letters appear on a grid:
> V W X Y Z > Q R S T U > L M N O P > F G H J K > A B C D E
>and we are looking for _both_ coordinates. Getting the x coordinate in >this way looks like a lot more hassle, and thus the idea of assigning >consecutive values to the letters (omitting i) and using floor is >appealing.
-- Barry Margolin, bar...@genuity.net Genuity, Woburn, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
(defun char-coords (char) "Returns two values: X and Y coordinates of CHAR in the letter grid." (set char (char-upcase char)) (let ((pos (position char +grid-letters+))) (unless pos (error "CHAR ~S is not in the grid." char)) ;; Our values are the opposite order of FLOORs (multiple-value-bind (quotient remainder) (floor pos 5) (values remainder quotient))))
-- Barry Margolin, bar...@genuity.net Genuity, Woburn, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
> (defun char-coords (char) > "Returns two values: X and Y coordinates of CHAR in the letter grid." > (set char (char-upcase char)) > (let ((pos (position char +grid-letters+))) > (unless pos > (error "CHAR ~S is not in the grid." char)) > ;; Our values are the opposite order of FLOORs > (multiple-value-bind (quotient remainder) (floor pos 5) > (values remainder quotient))))
Yes, I didn't like the dodgy order of the letters in the vector (in your previous version) either.
I also didn't tell the _whole_ truth about the position of the letters in the grid. (See below.)
In case you are wordering who the hell came up with this grid, it's the British Ordnance Survey.
So, here are the two candidate solutions. I'm not sure which one is cleaner. Maybe because of the faffing around necessary to invert the order in no. 1, no. 2 is ends up more transparent.
Anyway, thanks for the ideas.
(defconstant +grid-vector+ "ABCDEFGHJKLMNOPQRSTUVWXYZ") (defun grid-letter-to-position-1 (letter) "Converts a single letter into a pair of values representing the letter's position in a 5x5 grid of letters: A B C D E 04 14 24 34 44 F G H J K 03 13 23 33 43 L M N O P 02 12 22 32 42 Q R S T U 01 11 21 31 41 V W X Y Z 00 10 20 30 40" (let* ((upcased-letter (char-upcase letter)) (pos (position upcased-letter +grid-vector+))) (unless pos (error "The character ~S is not valid in a sheet name." letter)) (multiple-value-bind (south east) (floor pos 5) (values east (- 4 south)))))
(defun grid-letter-to-position-2 (letter) "Converts a single letter into a pair of values representing the letter's position in a 5x5 grid of letters: A B C D E 04 14 24 34 44 F G H J K 03 13 23 33 43 L M N O P 02 12 22 32 42 Q R S T U 01 11 21 31 41 V W X Y Z 00 10 20 30 40" (let ((upcased-letter (char-upcase letter))) (values (ecase upcased-letter ((#\A #\F #\L #\Q #\V) 0) ((#\B #\G #\M #\R #\W) 1) ((#\C #\H #\N #\S #\X) 2) ((#\D #\J #\O #\T #\Y) 3) ((#\E #\K #\P #\U #\Z) 4)) (ecase upcased-letter ((#\A #\B #\C #\D #\E) 4) ((#\F #\G #\H #\J #\K) 3) ((#\L #\M #\N #\O #\P) 2) ((#\Q #\R #\S #\T #\U) 1) ((#\V #\W #\X #\Y #\Z) 0)))))
>> I guess I can't claim that this is going to become too tedious or >> error-prone to write for, err, much longer alphabets :-)
>> Having said that, the fact that the approach _is_ error-prone is >> demonstrated by the line of lowercase chars in your code.
>And the lack of one letter too.
Which letter is missing? #\I is *supposed* to be skipped; his application finds the location of the letter in a 5x5 grid, and you can't put all 26 letters of the alphabet in a 25-element grid.
-- Barry Margolin, bar...@genuity.net Genuity, Woburn, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
* Jacek Generowicz <jacek.generow...@cern.ch> | I guess I can't claim that this is going to become too tedious or | error-prone to write for, err, much longer alphabets :-)
I "wrote" it using Emacs Lisp. For each line I evaluated
(loop for i from ?V to ?Z do (insert (format "#\\%c " i)))
You could have accomplished the same thing with #., but I think the code looks a lot better this way.
Unfortunately, I typed in lower-case letters in one of the lines.
| Having said that, the fact that the approach _is_ error-prone is | demonstrated by the line of lowercase chars in your code.
I have added you to my do-not-help list.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
There is nothing error-prone about this. He just made a typo; you aren't going to get anything tighter than a CASE statement like this.
Alas, like any source code, it is subject to the failings of our fingers. Then again, he might have done this on purpose for educational purposes, to see if you were a student looking for homework assistance (who can tell?)
Your character sets can't get any longer than what's available on your system (255?). You can always right a function or macro around his idea to handle expressions of of character intervals (regular expression patterns like [A-G,a-g] that expand into a CASE statement with very large keys.
Looks like you've solved it. They are both sound solutions; unless this is a library routine, the differences in memory and execution time cost between the two is probably not important.
> > (defun char-coords (char) > > "Returns two values: X and Y coordinates of CHAR in the letter grid." > > (set char (char-upcase char)) > > (let ((pos (position char +grid-letters+))) > > (unless pos > > (error "CHAR ~S is not in the grid." char)) > > ;; Our values are the opposite order of FLOORs > > (multiple-value-bind (quotient remainder) (floor pos 5) > > (values remainder quotient))))
> Yes, I didn't like the dodgy order of the letters in the vector (in > your previous version) either.
> I also didn't tell the _whole_ truth about the position of the letters > in the grid. (See below.)
> In case you are wordering who the hell came up with this grid, it's > the British Ordnance Survey.
> So, here are the two candidate solutions. I'm not sure which one is > cleaner. Maybe because of the faffing around necessary to invert the > order in no. 1, no. 2 is ends up more transparent.
> Anyway, thanks for the ideas.
> (defconstant +grid-vector+ "ABCDEFGHJKLMNOPQRSTUVWXYZ") > (defun grid-letter-to-position-1 (letter) > "Converts a single letter into a pair of values representing the > letter's position in a 5x5 grid of letters: > A B C D E 04 14 24 34 44 > F G H J K 03 13 23 33 43 > L M N O P 02 12 22 32 42 > Q R S T U 01 11 21 31 41 > V W X Y Z 00 10 20 30 40" > (let* ((upcased-letter (char-upcase letter)) > (pos (position upcased-letter +grid-vector+))) > (unless pos > (error "The character ~S is not valid in a sheet name." letter)) > (multiple-value-bind (south east) (floor pos 5) > (values east (- 4 south)))))
> (defun grid-letter-to-position-2 (letter) > "Converts a single letter into a pair of values representing the > letter's position in a 5x5 grid of letters: > A B C D E 04 14 24 34 44 > F G H J K 03 13 23 33 43 > L M N O P 02 12 22 32 42 > Q R S T U 01 11 21 31 41 > V W X Y Z 00 10 20 30 40" > (let ((upcased-letter (char-upcase letter))) > (values > (ecase upcased-letter > ((#\A #\F #\L #\Q #\V) 0) > ((#\B #\G #\M #\R #\W) 1) > ((#\C #\H #\N #\S #\X) 2) > ((#\D #\J #\O #\T #\Y) 3) > ((#\E #\K #\P #\U #\Z) 4)) > (ecase upcased-letter > ((#\A #\B #\C #\D #\E) 4) > ((#\F #\G #\H #\J #\K) 3) > ((#\L #\M #\N #\O #\P) 2) > ((#\Q #\R #\S #\T #\U) 1) > ((#\V #\W #\X #\Y #\Z) 0)))))
In article <3C886F6D.6AF79...@hotmail.com>, Steve Long <sal6...@hotmail.com> wrote:
>Jacek,
>There is nothing error-prone about this. He just made a typo; you aren't >going to get anything tighter than a CASE statement like this.
I think that the OP was hoping that there were some built-in way to map letters directly to their position in the alphabet. If the codes were required to be contiguous, you could do something like:
(- (char-code upc) (char-code #\A))
The reason that this is not specified is because we didn't want to tie Common Lisp to specific character encodings. In particular, I believe EBCDIC does not have the property that the letters have sequential values, the way they do in ASCII. All well-known character encodings follow the partial ordering that Common Lisp specifies for char-code, but anything more specific would probably rule out some well-known encodings.
Note also that CL provides the DIGIT-CHAR function. In other programming languages it's common to assume the decimal digits are sequential and do something analogous to:
(code-char (+ (char-code #\0) digit))
In an implementation that uses ASCII, this is likely to be what the definition of DIGIT-CHAR looks like (there will also be code to handle radix > 10), but the application programmer doesn't have to deal with it. Siilarly, this is the reason why DIGIT-CHAR-P returns the digit's numeric value.
-- Barry Margolin, bar...@genuity.net Genuity, Woburn, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
> There is nothing error-prone about this. He just made a typo; you aren't > going to get anything tighter than a CASE statement like this.
You're not? How about
(let ((code (char-code ch))) (if (> code 128) -1 (svref #.(let ((array (make-array 128 :initial-element -1))) (loop for ch across "abcdefghjklmnopqrstuvwxyz" ;apparently no "i" for lc = (char-code ch) for uc = (char-code (char-upcase ch)) for i from 0 do ;; I made both upper and lowercase work here, but ;; obviously you could choose one or the other. (setf (aref array lc) (setf (aref array uc) (truncate i 5))))) i)))
In most implementations, I'd expect this to be faster than the CASE statement, not that I checked. At least if you're doing straight ordering, since it's 25 compares in the worst case and 12.5 in the average case, compared to the cost of one char-code and one SVREF.
In certain pathological data where you had a very biased distribution and you exploited that in the case ordering, you might do better with the CASE, though I bet even then you'd need nested cases to get optimal speed.
Barry Margolin <bar...@genuity.net> writes: > In article <3C886F6D.6AF79...@hotmail.com>, > Steve Long <sal6...@hotmail.com> wrote: > >Jacek,
> >There is nothing error-prone about this. He just made a typo; you aren't > >going to get anything tighter than a CASE statement like this.
> I think that the OP was hoping that there were some built-in way to map > letters directly to their position in the alphabet. If the codes were > required to be contiguous, you could do something like:
> (- (char-code upc) (char-code #\A))
> The reason that this is not specified is because we didn't want to tie > Common Lisp to specific character encodings. In particular, I believe > EBCDIC does not have the property that the letters have sequential values, > the way they do in ASCII. All well-known character encodings follow the > partial ordering that Common Lisp specifies for char-code, but anything > more specific would probaby rule out some well-known encodings.
Exactly. One of the Pascal programs in "Algorithms + Data Strucutres = Programs" by Wirth had exactly that property. It assumed a character ordering /= ASCII.
Cheers -- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
Erik Naggum <e...@naggum.net> writes: > The specification of the goddamn _problem_ required I to be excluded, you > obnoxious dimwit. Pay some _attention_ around here, will you?
I know you have the benefit of never making mistakes, but I do make them sometimes. Oh, actually, your code did have a mistake, but that apparently doesn't count as one.
I already acknowledged my error (kindly pointed out by people who actually have social skills). I only post this message to add the comment that many more insults and my news reader will be automatically scoring your posts so low that I don't see them; if messages of yours seem to call for a response but don't get one, that will be why.
* Thomas Bushnell, BSG | I know you have the benefit of never making mistakes,
Let me know when you have cooled down so you do not make or at least post this kind of stupid insults. FYI: When I make mistakes, I am not averse to being corrected at all, in whatever way it seems appropriate, but I when some fault-finding asshole finds faults that are not even there, I conclude that it can only be malicious because it takes so little effort to pay sufficient attention to avoid making the mistake of accusing others of making mistakes they have not made and if you take the trouble to post a correction, you better be damn sure it is _more_ accurate than what you correct. It this really too much ask? Is it really productive for those who mistakenly "correct" others to try to defend themselves by attacking the one they have false accused when they should instead apologize for carelessly jumping to wrong conclusions. Returning insults is pretty solid evidence that the false accusation _was_ malicious.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
Centuries ago, Nostradamus foresaw when Erik Naggum <e...@naggum.net> would write:
> * Thomas Bushnell, BSG > | And the lack of one letter too. > The specification of the goddamn _problem_ required I to be > excluded, you obnoxious dimwit. Pay some _attention_ around here, > will you?
It was excluded from the "values mapping to 1" sequence.
The very first clause of the spec said a-z map to 0, so for I to map to NIL seems somewhat unexpected.
Mind you, a waggish answer to the problem would stop with the first clause:
That would seem likely to be a bit more correct than what you presented, not that "a bit more correct" is a terribly useful metric... -- (concatenate 'string "cbbrowne" "@acm.org") http://www.ntlug.org/~cbbrowne/spiritual.html Change is inevitable, except from a vending machine.