Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Strings and lists

4 views

Skip to first unread message

Josef Eschgfaeller

unread,

Jun 20, 1999, 3:00:00 AM6/20/99

I'm trying to improve

(defun first-word (a)
(let ((pos (position #\space a))) (if (not pos) a
(subseq a 0 pos))))

In such algorithms for strings it is often not clear to me, in
which cases it is convenient to introduce lists, and when it has
sense to mix strings with lists.
When I try to apply the following idea, valid for a list of characters,

(defun first-part (a)
(do ((part nil) (b a (rest b)))
((or (not b) (char= (first b) #\space)) (nreverse part))
(push (first b) part)))

I could begin by converting the string to a list by
(coerce a 'list), apply then first-part and reconvert
by (coerce b 'string). The second conversion seems easy,
the first one expensive.

I tried it also directly, as in

(defmacro char-ok (a i) `(and (array-in-bounds-p ,a ,i) (char ,a ,i)))

(defun first-word (a)
(do* ((word "" (push-char word x)) (i 0 (+ i 1))
(x (char-ok a i) (char-ok a i)))
((or (not x) (char= x #\space)) word)))

with

(defun push-char (word x) (concatenate 'string word (list x)))

- probably not efficient - or

(defun first-word (a)
(do* ((word (make-array '0 :adjustable t :fill-pointer t))
(i 0 (+ i 1)) (x (char-ok a i) (char-ok a i)))
((or (not x) (char= x #\space)) (coerce word 'string))
(vector-push-extend x word)))

J. Eschgfaeller

Erik Naggum

unread,

Jun 20, 1999, 3:00:00 AM6/20/99

* Josef Eschgfaeller <e...@felix.unife.it>

it is hard to read your code the way you (don't) indent it.

(defun first-word (a)
(let ((pos (position #\space a)))
(if (not pos)
a
(subseq a 0 pos))))

| In such algorithms for strings it is often not clear to me in which cases
| it is convenient to introduce lists and when it has sense to mix strings
| with lists.

you would have to have extraordinary needs for lists to make sense, but
instead of writing code for strings, write your code for sequences --
that way, you won't have to decide on types until you have a particular
need. note that POSITION, for instance, works on sequences, not only
strings. that is, (position #\Space (coerce "foo bar zot" 'list)) works
just as well as (position #\Space "foo bar zot"). likewise for SUBSEQ,
which is short for SUBSEQUENCE. many new Lisp programmers are thrown by
the fact that most usual string operations aren't string operations in
Lisp, but sequence operations, like SEARCH, POSITION, and SUBSEQ.

speaking generally, I think you should consider a less specific function,
since hacking up sequences is a very common task, and you also often need
to know whether there are further elements in the sequence you hack up.
in other words, return both halves.

(defun get-token (sequence delimiter &key (key #'identity) (test #'eql))
"Return the non-empty subsequences of SEQUENCES before and after the first
occurrence of the DELIMITER as two values, or NIL if it would be empty."
(let* ((start (or (position delimiter sequence
:key key
:test (complement test))
(return-from get-token nil)))
(end (or (position delimiter sequence
:key key
:test test
:start start)
(return-from get-token (subseq sequence start))))
(next (or (position delimiter sequence
:key key
:test (complement test)
:start (1+ end))
(return-from get-token (subseq sequence start)))))
(values (subseq sequence start end) (subseq sequence next))))

when hacking up fairly large strings, it makes sense to be cons-sensitive
and replace the last form with

(values (subseq sequence start end)
(if (vectorp sequence)
(multiple-value-bind (base offset)
(array-displacement sequence)
(if base
(adjust-array sequence (- (length sequence) next)
:displaced-to base
:displaced-index-offset (+ next offset))
(make-array (- (length sequence) next)
:element-type (array-element-type sequence)
:displaced-to sequence
:displaced-index-offset next)))
(subseq sequence next)))

which will not cause the tail of the string to needlessly copied.

the above is somewhat expanded from production code in one of my systems.
I use MAKE-INDIRECT-VECTOR and MOVE-INDIRECT-VECTOR to manipulate
displaced strings, so the consequent of the final conditional is actually
only (move-indirect-vector sequence :start next). (this is why there are
no :START and :END keywords.)

#:Erik
--
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century

0 new messages