Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Good form for a character predicate?

42 views
Skip to first unread message

Udyant Wig

unread,
Jan 6, 2019, 12:50:05 AM1/6/19
to
Suppose that a character has to be checked against a list of possible
values. What form of predicate is to be preferred?

(defun char-of-interest-1-p (ch)
(member ch (list char1 char2 char3 char4)))

(defun char-of-interest-2-p (ch)
(or (eq ch char1)
(eq ch char2)
(eq ch char3)
(eq ch char4)))

(defun char-of-interest-3-p (ch)
(case ch
((char1 char2 char3 char4) t)
(otherwise nil)))

Udyant Wig
--
We make our discoveries through our mistakes: we watch one another's
success: and where there is freedom to experiment there is hope to
improve.
-- Arthur Quiller-Couch

Rob Warnock

unread,
Jan 6, 2019, 2:31:23 AM1/6/19
to
Udyant Wig <udy...@gmail.com> wrote:
+---------------
| Suppose that a character has to be checked against a list of possible
| values. What form of predicate is to be preferred?
|
| (defun char-of-interest-1-p (ch)
| (member ch (list char1 char2 char3 char4)))
+---------------

First, with this form you almost never want to build the search list
at runtime. [I'm assuming "char1" here is a metavariable for an actual
character literal, not a real variable *containing* a character.]
So use either this:

(defun char-of-interest-1-p (ch)
(member ch '(#\A #\c #\E #\g)))

or [especially if "charN" *are* variables!]:

(defun char-of-interest-1-p (ch)
(member ch (load-time-value (list char1 char2 char3 char4))))

{And ignore what I just said if the "charN" are *mutable* variables
and *are* mutated during execution.]

+---------------
| (defun char-of-interest-2-p (ch)
| (or (eq ch char1)
| (eq ch char2)
| (eq ch char3)
| (eq ch char4)))
+---------------

Similar comments for this one, except that (a) it's longer;
(b) the MEMBER version might be faster; and (c) it's not even
guaranteed to *work* at all!!! As CLHS "Function EQ" says:

An implementation is permitted to make ``copies'' of characters
and numbers at any time. The effect is that Common Lisp makes
no guarantee that eq is true even when both its arguments are
``the same thing'' if that thing is a character or number.

So if you use this form, use EQL, not EQ.

+---------------
| (defun char-of-interest-3-p (ch)
| (case ch
| ((char1 char2 char3 char4) t)
| (otherwise nil)))
+---------------

This might generate the fastest code of the three so far, but
note that the "charN" must be literal characters, *not* variable
names, since the key list is *not* evaluated. [Note that this
also means that you can't use LOAD-TIME-VALUE with this version.]

A fourth version you might want to use -- especially if
you have a *large* number of characters in the set of
interest -- is a hash table:

(defparameter *char-of-interest-4-hash*
(let ((h (make-hash-table)))
(dolist (c (list char1 char2 char3 char4)) ;; Or (coerce "abcde" 'list), etc.
(setf (gethash c h) c)) ;; Or maybe (setf (gethash c h) t)
h))

(defun char-of-interest-4-p (ch)
(gethash ch *char-of-interest-4-hash*))

Ignoring cache effects, this lookup will be approximately
constant time no matter how many characters are in the set.
[For very small sets, MEMBER or CASE will still probably
be faster.]


-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <http://rpw3.org/>
San Mateo, CA 94403

Pascal J. Bourguignon

unread,
Jan 6, 2019, 6:42:46 AM1/6/19
to
Udyant Wig <udy...@gmail.com> writes:

> Suppose that a character has to be checked against a list of possible
> values. What form of predicate is to be preferred?
>
> (defun char-of-interest-1-p (ch)
> (member ch (list char1 char2 char3 char4)))
>
> (defun char-of-interest-2-p (ch)
> (or (eq ch char1)
> (eq ch char2)
> (eq ch char3)
> (eq ch char4)))
>
> (defun char-of-interest-3-p (ch)
> (case ch
> ((char1 char2 char3 char4) t)
> (otherwise nil)))


It depends. If the list of character can be dynamic, (is not know at
compilation-time), then one quick and concise way to do it is to put
them in a string (or vector), and to use FIND or POSITION:

(defun whitespacep (ch) (find ch #(#\space #\tab #\newline)))

(defparameter *letters* "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
(defun letterp (ch) (position ch *letters*))

(let ((*letters* "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя"))
(map 'list 'letterp "АA))
--> (0 nil)



On the other hand, if the list of character is known at compilation
time, you can let the compiler decide on the faster way to implement the
predicate, by using a CASE:

(defun whitespace (ch)
(case ch
((#\space #\tab #\newline) t)
(otherwise nil)))

(defun letterp (ch)
(case ch
(#.(coerce "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 'list) t)
(otherwise nil)))


--
__Pascal J. Bourguignon__
http://www.informatimago.com

Udyant Wig

unread,
Jan 6, 2019, 6:43:49 AM1/6/19
to
On 1/6/19 1:01 PM, Rob Warnock wrote:
> First, with this form you almost never want to build the search list
> at runtime. [I'm assuming "char1" here is a metavariable for an actual
> character literal, not a real variable *containing* a character.]

Indeed, CHAR1 is a metavariable.

> So use either this:
>
> (defun char-of-interest-1-p (ch)
> (member ch '(#\A #\c #\E #\g)))
>
> or [especially if "charN" *are* variables!]:
>
> (defun char-of-interest-1-p (ch)
> (member ch (load-time-value (list char1 char2 char3 char4))))
>
> {And ignore what I just said if the "charN" are *mutable* variables
> and *are* mutated during execution.]

Would it matter much if I had the following, say?

(defconstant +characters+ (list #\a #\b #\c #\d))
(defun char-of-interest-p (ch)
(member ch (load-time-value +characters+)))

Would a quoted list be preferable?

> +---------------
> | (defun char-of-interest-2-p (ch)
> | (or (eq ch char1)
> | (eq ch char2)
> | (eq ch char3)
> | (eq ch char4)))
> +---------------
>
> Similar comments for this one, except that (a) it's longer; (b) the
> MEMBER version might be faster; and (c) it's not even guaranteed to
> *work* at all!!! As CLHS "Function EQ" says:
>
> An implementation is permitted to make ``copies'' of characters
> and numbers at any time. The effect is that Common Lisp makes no
> guarantee that eq is true even when both its arguments are ``the
> same thing'' if that thing is a character or number.
>
> So if you use this form, use EQL, not EQ.

Understood.

> +---------------
> | (defun char-of-interest-3-p (ch)
> | (case ch
> | ((char1 char2 char3 char4) t)
> | (otherwise nil)))
> +---------------
>
> This might generate the fastest code of the three so far, but note
> that the "charN" must be literal characters, *not* variable names,
> since the key list is *not* evaluated. [Note that this also means that
> you can't use LOAD-TIME-VALUE with this version.]

Understood.

> A fourth version you might want to use -- especially if you have a
> *large* number of characters in the set of interest -- is a hash
> table:
>
> (defparameter *char-of-interest-4-hash*
> (let ((h (make-hash-table)))
> (dolist (c (list char1 char2 char3 char4)) ;; Or (coerce "abcde"
'list), etc.
> (setf (gethash c h) c)) ;; Or maybe (setf (gethash c h) t)
> h))
>
> (defun char-of-interest-4-p (ch)
> (gethash ch *char-of-interest-4-hash*))
>
> Ignoring cache effects, this lookup will be approximately constant
> time no matter how many characters are in the set. [For very small
> sets, MEMBER or CASE will still probably be faster.]

Thanks for this advice, too. My current use is for four element lists,
however, and I ought to have made it clear up front.

Is there any general principle governing when to switch over to
hash-tables from simple lists, be they flat or association? What
threshold number of elements ought to be of concern?

> -Rob

All advice is much appreciated. Thanks.

Udyant Wig

unread,
Jan 6, 2019, 9:59:56 AM1/6/19
to
I implemented some different versions of a very simple whitespace
character predicate, testing only among newline, tab, space, and return.

;;;; -*- mode: common-lisp -*-
(defun wsp-1 (ch)
(case ch
((#\newline #\tab #\space #\return) t)
(otherwise nil)))

(defun wsp-2 (ch)
(member ch '(#\newline #\tab #\space #\return)))

(defun wsp-3 (ch)
(member ch (load-time-value (list #\newline #\tab #\space #\return))))

(defconstant +white-space+ (list #\newline #\tab #\return #\space))
(defun wsp-4 (ch)
(member ch +white-space+))

(defun wsp-5 (ch)
(or (eql ch #\newline)
(eql ch #\space)
(eql ch #\tab)
(eql ch #\return)))

(defconstant +ws-hash+
(let ((ws-hash (make-hash-table)))
(dolist (ch +white-space+)
(setf (gethash ch ws-hash) ch))
ws-hash))
(defun wsp-6 (ch)
(gethash ch +ws-hash+))

(defun wsp-test (n)
(let ((spaces (make-string n :initial-element #\a)))
(format t "~%wsp-1")
(time (loop for ch across spaces do (wsp-1 ch)))
(terpri)

(format t "~%wsp-2")
(time (loop for ch across spaces do (wsp-2 ch)))
(terpri)

(format t "~%wsp-3")
(time (loop for ch across spaces do (wsp-3 ch)))
(terpri)

(format t "~%wsp-4")
(time (loop for ch across spaces do (wsp-4 ch)))
(terpri)

(format t "~%wsp-5")
(time (loop for ch across spaces do (wsp-5 ch)))
(terpri)

(format t "~%wsp-6")
(time (loop for ch across spaces do (wsp-6 ch)))
(terpri)))
;;;

Taking Rob Warnock's suggestion, I used EQL instead of EQ; I also
implemented his hash table version in WSP-6. WSP-1 (CASE) was the
fastest, as he had predicted, with the caveat that the list under test
be of literal characters, and not variables.

I ran these functions over strings ten million in length (full of
#\space once, and full of #\a later). The functions WSP-1 (using CASE),
and WSP-5 (using OR and EQL), were uniformly the quickest, even across
different Lisp implementations.

As an aside, some Lisps did better than others at WSP-2 (MEMBER on a
quoted list) and WSP-4 (MEMBER on a defconstant list).

Barry Margolin

unread,
Jan 6, 2019, 4:16:23 PM1/6/19
to
In article <q0s4u9$1kt$1...@dont-email.me>, Udyant Wig <udy...@gmail.com>
wrote:

> Suppose that a character has to be checked against a list of possible
> values. What form of predicate is to be preferred?

What criteria are you interested in? Performance or style?

>
> (defun char-of-interest-1-p (ch)
> (member ch (list char1 char2 char3 char4)))
>
> (defun char-of-interest-2-p (ch)
> (or (eq ch char1)
> (eq ch char2)
> (eq ch char3)
> (eq ch char4)))
>
> (defun char-of-interest-3-p (ch)
> (case ch
> ((char1 char2 char3 char4) t)
> (otherwise nil)))
>
> Udyant Wig

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Barry Margolin

unread,
Jan 6, 2019, 4:19:22 PM1/6/19
to
In article <q0sas7$ohe$1...@dont-email.me>, rp...@rpw3.org (Rob Warnock)
wrote:

> Udyant Wig <udy...@gmail.com> wrote:
> +---------------
> | Suppose that a character has to be checked against a list of possible
> | values. What form of predicate is to be preferred?
> |
> | (defun char-of-interest-1-p (ch)
> | (member ch (list char1 char2 char3 char4)))
> +---------------
>
> First, with this form you almost never want to build the search list
> at runtime. [I'm assuming "char1" here is a metavariable for an actual
> character literal, not a real variable *containing* a character.]
> So use either this:
>
> (defun char-of-interest-1-p (ch)
> (member ch '(#\A #\c #\E #\g)))
>
> or [especially if "charN" *are* variables!]:
>
> (defun char-of-interest-1-p (ch)
> (member ch (load-time-value (list char1 char2 char3 char4))))

I'd be very surprised if most compilers don't optimize the original code
to something equivalent. MEMBER is a built-in function, the compiler
knows that it doesn't modify or save the sequence argument, so there's
no need to make a fresh copy each time.

Udyant Wig

unread,
Jan 7, 2019, 12:56:11 AM1/7/19
to
On 1/7/19 2:46 AM, Barry Margolin wrote:
> What criteria are you interested in? Performance or style?

Clarity and correctness are paramount for me at present. Being
reasonably fast would be very nice. (I am hesitant about advanced
efficiency matters such as declarations because I believe I will not be
able to do justice with that knowledge right now.)

Udyant Wig

unread,
Jan 7, 2019, 1:07:44 AM1/7/19
to
On 1/6/19 5:12 PM, Pascal J. Bourguignon wrote:
> It depends. If the list of character can be dynamic, (is not know at
> compilation-time), then one quick and concise way to do it is to put
> them in a string (or vector), and to use FIND or POSITION:
>
> (defun whitespacep (ch) (find ch #(#\space #\tab #\newline)))
>
> (defparameter *letters* "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
> (defun letterp (ch) (position ch *letters*))
>
> (let ((*letters* "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя"))
> (map 'list 'letterp "АA))
> --> (0 nil)

For small sets of characters, say fewer than ten, would you prefer lists
or vectors? How does the choice of MEMBER, POSITION, or FIND factor
into the selection?

> On the other hand, if the list of character is known at compilation
> time, you can let the compiler decide on the faster way to implement
> the predicate, by using a CASE:
>
> (defun whitespace (ch)
> (case ch
> ((#\space #\tab #\newline) t)
> (otherwise nil)))

Indeed. This was the version I thought of after a first try with
MEMBER. Rob Warnock has commented that this may indeed be the fastest
for a fixed, known set of characters.

> (defun letterp (ch)
> (case ch
> (#.(coerce "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 'list) t)
> (otherwise nil)))

For what reasons would one prefer read-time construction of the value
list to holding it with LOAD-TIME-VALUE?

Pascal J. Bourguignon

unread,
Jan 7, 2019, 3:34:18 AM1/7/19
to
Udyant Wig <udy...@gmail.com> writes:

> On 1/6/19 5:12 PM, Pascal J. Bourguignon wrote:
>> It depends. If the list of character can be dynamic, (is not know at
>> compilation-time), then one quick and concise way to do it is to put
>> them in a string (or vector), and to use FIND or POSITION:
>>
>> (defun whitespacep (ch) (find ch #(#\space #\tab #\newline)))
>>
>> (defparameter *letters* "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
>> (defun letterp (ch) (position ch *letters*))
>>
>> (let ((*letters* "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя"))
>> (map 'list 'letterp "АA))
>> --> (0 nil)
>
> For small sets of characters, say fewer than ten, would you prefer lists
> or vectors? How does the choice of MEMBER, POSITION, or FIND factor
> into the selection?

Vector of characters are self contained and use less memory than list,
so they should be faster to scan.

>> On the other hand, if the list of character is known at compilation
>> time, you can let the compiler decide on the faster way to implement
>> the predicate, by using a CASE:
>>
>> (defun whitespace (ch)
>> (case ch
>> ((#\space #\tab #\newline) t)
>> (otherwise nil)))
>
> Indeed. This was the version I thought of after a first try with
> MEMBER. Rob Warnock has commented that this may indeed be the fastest
> for a fixed, known set of characters.
>
>> (defun letterp (ch)
>> (case ch
>> (#.(coerce "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 'list) t)
>> (otherwise nil)))
>
> For what reasons would one prefer read-time construction of the value
> list to holding it with LOAD-TIME-VALUE?

1- CASE takes a literal, so load-time-value wouldn't work.
2- using #. is only an editing convenience. You could also use emacs
lisp to generate the exact source:

(loop for ch across "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
do (insert (format "#\\%c " ch)))
#\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z nil

then edit it into:

(case ch
((#\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m #\n #\o #\p #\q
#\r #\s #\t #\u #\v #\w #\x #\y #\z #\A #\B #\C #\D #\E #\F #\G #\H #\I
#\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z) t)
(otherwise nil))

It may also be slightly easier to read and understand the #. form.

Spiros Bousbouras

unread,
Jan 21, 2019, 12:39:14 PM1/21/19
to
On Sun, 6 Jan 2019 20:29:50 +0530
Udyant Wig <udy...@gmail.com> wrote:
> I implemented some different versions of a very simple whitespace
> character predicate, testing only among newline, tab, space, and return.

[...]

> (defconstant +white-space+ (list #\newline #\tab #\return #\space))
> (defun wsp-4 (ch)
> (member ch +white-space+))

[...]

> (defconstant +ws-hash+
> (let ((ws-hash (make-hash-table)))
> (dolist (ch +white-space+)
> (setf (gethash ch ws-hash) ch))
> ws-hash))
> (defun wsp-6 (ch)
> (gethash ch +ws-hash+))

Another option is to make your own custom made hash table using CHAR-CODE
as a hash function.

(defconstant max-char-code-value 255)
; Or whatever. Using the following suggestion may or may not be practical
; depending on how large the value is.

(proclaim '(special char-lookup-vector))
(proclaim `(type (simple-array bit (,(+ 1 max-char-code-value))) char-lookup-vector))
(setq char-lookup-vector
(make-array (+ 1 max-char-code-value) :element-type 'bit :initial-element 0))

(dolist (ch +white-space+) (setf (sbit char-lookup-vector (char-code ch)) 1))

(defun wsp-7 (ch) (eql 1 (sbit char-lookup-vector (char-code ch))))


It would be useful if CL provided a way to declare the type of the keys of a hash
table.

--
A Russian-speaking friend claimed that "brekzit" had become a verb in some
eastern-European parts, meaning: saying goodbye to everyone as if you're
about to leave, but then not actually leaving.
http://www.antipope.org/charlie/blog-static/2019/01/someone-please-cancel-2019-alr.html

Helmut Eller

unread,
Jan 21, 2019, 2:52:15 PM1/21/19
to
On Sun, Jan 06 2019, Udyant Wig wrote:

> Suppose that a character has to be checked against a list of possible
> values. What form of predicate is to be preferred?
[...]
A bit late, but I think this variant was not yet mentioned:

(deftype intersting-chars () '(member char1 char2 char3))
(defun char-of-interest-p (x) (typep x 'interesting-chars))

Having a named type for the interesting characters can be useful for
declaring the types of struct slots or in TYPECASE expressions.

Helmut
0 new messages