I wrote the following piece of code:
(deftype whitespace () '(member #\Space #\Newline #\Return #\Tab))
(deftype digit () '(member #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
(deftype punctuation () '(member #\. #\, #\! #\? #\; #\" #\' #\] #\[ #\( #\) #\\ #\/ #\{ #\} #\:))
(deftype text-type () '(not (or digit whitespace punctuation)))
(defparameter test-string "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.")
(defun test-text-type ()
(loop
for i from 0 to (1- (length test-string))
collect (typep (aref test-string i) 'text-type)))
(test-text-type) on sbcl returns
(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL)
when on clisp (test-text-type) returns
(T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T
T T T T T T T T T T T T T NIL)
The bug is in sbcl or in clisp? And how text-type should be correctly defined
if I am wrong?
--
With best regards,
Anton Kazennikov. mailto:kazennikov[at]mirea.ru ICQ# 98965967
member by default uses #'eq as the test. Whether or not characters with
identical print-names are actually identical objects is underspecified in
the Common Lisp spec to be implementation specific. In SBCL they are not.
Using char= as the test will work in any implmentation. Also you should use
#' instead of just ' in front of the member statement. If you compile the
code and use just ' the member statement will be interpreted not compiled
where #' will ensure its compiled. The bottom line, either will work, but
#' (may) be faster...
"Anton Kazennikov" <kazenni...@gmail.com> wrote in message
news:87wt7ll...@kzn.homelinux.org...
> Hello,
>
> I wrote the following piece of code:
>
> (deftype whitespace () '(member #\Space #\Newline #\Return #\Tab))
> (deftype digit () '(member #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
> (deftype punctuation () '(member #\. #\, #\! #\? #\; #\" #\' #\] #\[ #\( #\) #\\ #\/ #\{ #\} #\:))
> (deftype text-type () '(not (or digit whitespace punctuation)))
>
> (defparameter test-string "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.")
>
> (defun test-text-type ()
> (loop
> for i from 0 to (1- (length test-string))
> collect (typep (aref test-string i) 'text-type)))
>
>
> (test-text-type) on sbcl returns
> (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
> NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
> NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL)
>
> when on clisp (test-text-type) returns
>
> (T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T
> T T T T T T T T T T T T T NIL)
>
> The bug is in sbcl or in clisp?
The bug is with sbcl.
> And how text-type should be correctly defined if I am wrong?
There is a ALPHA-CHAR-P and a ALPHANUMERICP predicate.
You're missing all these digits:
(defun numericp (ch) (and (ALPHANUMERICP ch) (not (alpha-char-p ch))))
(coerce (loop :for i :from 0 :to #x110000
:when (numericp (code-char i)) :collect (code-char i)) 'string)
Perhaps sbcl supports better SATISFIES types than OR.
(defun spacep (ch) ; on Mac, newline could be return!!!
(member ch '(#\Space #\Newline #\linefeed #\Return #\Tab)))
(defun punctuationp (ch)
(not (or (spacep ch) (alphanumericp ch))))
(deftype whitespace () '(satisfies spacep))
(deftype digit () '(satisfies numericp))
(deftype alphabetic () '(satisfies alpha-char-p))
(deftype punctuation () '(satisfies punctuationp))
(defparameter *test-string*
"HeiHyvääpäivääΓειάσαςשלוםЗдравствуйтеสวัสดีครับสวัสดีค่ะこんにちはコンニチハ你好안녕하세요안녕하십니까")
(defun test-alphabetic ()
(loop :for ch :across *test-string*
:collect (cons (typep ch 'alphabetic) (string ch))))
Gives good results on both implementations, but there remains a few
differences between sbcl and clisp:
(coerce (loop :for ch :across *test-string*
:collect (if (typep ch 'alphabetic) 1 0)) 'bit-vector)
SBCL:
#*1111111111111111111111111111111111111110110110111011010111111111111111111111111
#*1111111111111111111111111111111111111111111111111111111111111111111111111111111
CLISP:
"ั" being considered an "accent" by SBCL.
SBCL> (let ((*test-string* "สวัสดีครับ")) (test-alphabetic))
((T . "ส") (T . "ว") (NIL . "ั") (T . "ส") (T . "ด") (NIL . "ี") (T . "ค")
(T . "ร") (NIL . "ั") (T . "บ"))
CLISP> (let ((*test-string* "สวัสดีครับ")) (test-alphabetic))
((T . "ส") (T . "ว") (T . "ั") (T . "ส") (T . "ด") (T . "ี") (T . "ค")
(T . "ร") (T . "ั") (T . "บ"))
I don't know who splits the "วั" into "วั", perhaps it's emacs, perhaps
the inferior lisps.
(And it seems that some characters found in emacs Hello file cannot be
encoded in unicode.)
--
__Pascal Bourguignon__ http://www.informatimago.com/
"This statement is false." In Lisp: (defun Q () (eq nil (Q)))
Looks like a bug in SBCL (related to negated character set types).
Applying the following patch and recompiling SBCL should fix the
problem.
Index: src/code/late-type.lisp
===================================================================
RCS file: /cvsroot/sbcl/sbcl/src/code/late-type.lisp,v
retrieving revision 1.134
diff -u -b -r1.134 late-type.lisp
--- src/code/late-type.lisp 13 Sep 2006 15:59:33 -0000 1.134
+++ src/code/late-type.lisp 30 Sep 2006 15:40:45 -0000
@@ -3118,8 +3118,8 @@
(when (> (caar pairs) 0)
(push (cons 0 (1- (caar pairs))) not-pairs))
(do* ((tail pairs (cdr tail))
- (high1 (cdar tail))
- (low2 (caadr tail)))
+ (high1 (cdar tail) (cdar tail))
+ (low2 (caadr tail) (caadr tail)))
((null (cdr tail))
(when (< (cdar tail) (1- sb!xc:char-code-limit))
(push (cons (1+ (cdar tail))
--
Juho Snellman
"Richard S. Hall" <rha...@earthlink.net> wrote in message
news:ShwTg.3203$Y24....@newsread4.news.pas.earthlink.net...
> Use "(deftype xxx () #'(member ... :test #'char=)"
Wrong. Use (deftype xxx () '(member #\a #\b)) .
CL-USER 5 : 2 > (deftype xxx () '(member #\a #\b))
XXX
CL-USER 6 : 2 > (typep #\a 'xxx)
T
CL-USER 7 : 2 > (typep #\c 'xxx)
NIL
> member by default uses #'eq as the test.
Wrong. It uses EQL.
> Whether or not characters with
> identical print-names are actually identical objects is underspecified in
> the Common Lisp spec to be implementation specific.
Characters don't have print names. EQL is guaranteed to return T if
two identical characters are passed to it, regardless of whether or
not they are the same object.
> In SBCL they are not.
> Using char= as the test will work in any implmentation.
So will EQL!
> Also you should use
> #' instead of just ' in front of the member statement.
Wrong. DEFTYPE takes a type specifier, not a function.
> If you compile the
> code and use just ' the member statement will be interpreted not compiled
> where #' will ensure its compiled.
Huh?? In SBCL in particular, there is no interpreter; everything is
compiled. In other implementations, your statement is still wrong.
> The bottom line, either will work, but
> #' (may) be faster...
Don't top-post.
Giving wrong advice is worse than giving no advice at all.
> My bad, I just re-read the hyperspec - member FUNCTION uses #'eq by
> default
Where do you see this?
You are right, I should do my homework before trusting my foggy old memory!
The issue with SBCL is not with character objects but with strings. In
SBCL:
>(member "a" '("a" "b" "c" "d"))
NIL ; (which may be non nil in many implementations)
or even
>(eq "a" "a")
NIL
>(eql "a" "a")
NIL
however:
(member #\a '(#\a #\b #\c #\d))
(#\a #\b #\c #\d) ; (always non-nil)
I've had this issue with things that "work by coincedence" in other
implementations "working as designed" in SBCL so I jumped to conclusions
because its the first thing I think of when I see something that looks the
same (OK print-name is the wrong term) being unequal in SBCL. BTW the
hyperspec doesn't say what the member FUNCTION defaults to, although it does
on the TYPE-SPECIFIER so I relied on my memory. Clearly a mistake. As a
result I learned something that will stick better in my memory. Hopefully
more good than harm was ultimately done. :-)
-- Rick
> My bad, I just re-read the hyperspec - member FUNCTION uses #'eq by
> default
No, it uses EQL by default.
Paul