Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to split a string (or arbitrary sequence) at each occurrence of a value.

252 views
Skip to first unread message

Daniel Pittman

unread,
Oct 12, 2001, 4:35:25 AM10/12/01
to
I am looking for the simplest way to split a string into four strings
based on a character -- to parse an IP address string, specifically.

What is the best, easiest, fastest, etc, way to split a string into
substrings based on a character position. In Emacs Lisp I would just:

(let ((address "210.23.138.16"))
(split-string address "\\.")) ; second arg is regexp to split on.

Now, I don't actually need regexp functionality here; a literal '.' is
enough for me.

This strikes me as the sort of idiom that would be common enough for
Common Lisp[1] to feature it as part of the standard.

I would also be interested to know if y'all can suggest a general way to
do this for generalized sequences as well as for strings, but that's not
what I need to do right now.


Oh, and am I making a really silly mistake storing an IP address in a
slot of ":type (vector (integer 0 255) 4)"?

Daniel

Footnotes:
[1] CLISP 2.27, specifically, with the HyperSpec as reference.

--
Money won't buy happiness, but it will pay the salaries of
a large research staff to study the problem.
-- Bill Vaughan

Dr. Edmund Weitz

unread,
Oct 12, 2001, 5:24:17 AM10/12/01
to
Daniel Pittman <dan...@rimspace.net> writes:

> I am looking for the simplest way to split a string into four strings
> based on a character -- to parse an IP address string, specifically.

I needed something similar last week and came up with this solution:

(defun split (sequence &key
(test #'(lambda (x) (eq x #\Space))))
"Returns a list of sub-sequences of SEQUENCE where each
element that satisfies TEST is treated as a separator."
(let (result)
(do* ((old-pos (position-if-not test sequence)
(when old-pos
(position-if-not test sequence
:start old-pos))))
((null old-pos) (nreverse result))
(let ((new-pos
(position-if test sequence
:start old-pos)))
(if new-pos
(setf result (cons
(subseq sequence old-pos new-pos)
result)
old-pos (1+ new-pos))
(setf result (cons
(subseq sequence old-pos)
result)
old-pos nil))))))

Note that this might not be very fast, I didn't need it. Also note
that I'm rather new to CL, so others here will definitely have better
solutions.

Best regards,
Edi.

Erik Haugan

unread,
Oct 12, 2001, 5:28:07 AM10/12/01
to
* Daniel Pittman <dan...@rimspace.net>

> What is the best, easiest, fastest, etc, way to split a string into
> substrings based on a character position. In Emacs Lisp I would just:

This may not be fast (I don't know), but it's straight-forward and readable.

(defun split (string &optional (delimiter #\Space))
(with-input-from-string (*standard-input* string)
(let ((*standard-output* (make-string-output-stream)))
(nconc (loop for char = (read-char nil nil nil)
while char
if (char= char delimiter)
collect (get-output-stream-string *standard-output*)
else
do (write-char char))
(list (get-output-stream-string *standard-output*))))))

Erik

Christophe Rhodes

unread,
Oct 12, 2001, 5:40:41 AM10/12/01
to
Daniel Pittman <dan...@rimspace.net> writes:

> I am looking for the simplest way to split a string into four strings
> based on a character -- to parse an IP address string, specifically.
>

> [snip]

>
> This strikes me as the sort of idiom that would be common enough for
> Common Lisp[1] to feature it as part of the standard.
>
> I would also be interested to know if y'all can suggest a general way to
> do this for generalized sequences as well as for strings, but that's not
> what I need to do right now.

See <URL:http://ww.telent.net/cliki/PARTITION>.

> Oh, and am I making a really silly mistake storing an IP address in a
> slot of ":type (vector (integer 0 255) 4)"?

No, that's less stupid than a lot of other representations :-)

Cheers,

Christophe
--
Jesus College, Cambridge, CB5 8BL +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/ (defun pling-dollar
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)

Christophe Rhodes

unread,
Oct 12, 2001, 5:43:03 AM10/12/01
to
[ superseded to clarify ]

Daniel Pittman <dan...@rimspace.net> writes:

> I am looking for the simplest way to split a string into four strings
> based on a character -- to parse an IP address string, specifically.
>

> [snip]

>
> This strikes me as the sort of idiom that would be common enough for
> Common Lisp[1] to feature it as part of the standard.
>
> I would also be interested to know if y'all can suggest a general way to
> do this for generalized sequences as well as for strings, but that's not
> what I need to do right now.

See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
community-discussed function is described in roughly
specification-level detail, with links to a reference implementation.

> Oh, and am I making a really silly mistake storing an IP address in a
> slot of ":type (vector (integer 0 255) 4)"?

No, that's less stupid than a lot of other representations :-)

Erik Haugan

unread,
Oct 12, 2001, 9:01:41 AM10/12/01
to
Sorry for replying to my own article, however, I made such an inelegant
twist in the code I posted that I feel I have to correct it:

(defun string-split (string &optional (delimiter #\Space))


(with-input-from-string (*standard-input* string)
(let ((*standard-output* (make-string-output-stream)))

(loop for char = (read-char nil nil nil)

if (or (null char)
(char= char delimiter))


collect (get-output-stream-string *standard-output*)
else
do (write-char char)

while char))))

Erik

Marco Antoniotti

unread,
Oct 12, 2001, 9:49:03 AM10/12/01
to

Christophe Rhodes <cs...@cam.ac.uk> writes:

> [ superseded to clarify ]
>
> Daniel Pittman <dan...@rimspace.net> writes:
>
> > I am looking for the simplest way to split a string into four strings
> > based on a character -- to parse an IP address string, specifically.
> >
> > [snip]
> >
> > This strikes me as the sort of idiom that would be common enough for
> > Common Lisp[1] to feature it as part of the standard.
> >
> > I would also be interested to know if y'all can suggest a general way to
> > do this for generalized sequences as well as for strings, but that's not
> > what I need to do right now.
>
> See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
> community-discussed function is described in roughly
> specification-level detail, with links to a reference
> implementation.

I am sorry to be sooo nagging (again) on such a stupid matter. But......

The name PARTITION is inappropriate. SPLIT-SEQUENCE is much more
descriptive of what the function does.

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
719 Broadway 12th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.

Wade Humeniuk

unread,
Oct 12, 2001, 10:33:25 AM10/12/01
to
I usually do this type of thing with

(defun read-delimited-string (string &optional (delimiter #\.))
"Returns a read list of delimited values from a string"
(read-from-string
(concatenate 'string "("
(substitute #\space delimiter string)
")")))

CL-USER 3 > (read-delimited-string "210.23.138.16")
(210 23 138 16)
15

CL-USER 4 >

Wade

"Daniel Pittman" <dan...@rimspace.net> wrote in message
news:873d4pj...@inanna.rimspace.net...

Christophe Rhodes

unread,
Oct 12, 2001, 10:43:14 AM10/12/01
to
Marco Antoniotti <mar...@cs.nyu.edu> writes:

> Christophe Rhodes <cs...@cam.ac.uk> writes:
>
> > [ superseded to clarify ]
> >
> > Daniel Pittman <dan...@rimspace.net> writes:
> >
> > > I am looking for the simplest way to split a string into four strings
> > > based on a character -- to parse an IP address string, specifically.
> > >
> > > [snip]
> > >
> > > This strikes me as the sort of idiom that would be common enough for
> > > Common Lisp[1] to feature it as part of the standard.
> > >
> > > I would also be interested to know if y'all can suggest a general way to
> > > do this for generalized sequences as well as for strings, but that's not
> > > what I need to do right now.
> >
> > See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
> > community-discussed function is described in roughly
> > specification-level detail, with links to a reference
> > implementation.
>
> I am sorry to be sooo nagging (again) on such a stupid matter. But......
>
> The name PARTITION is inappropriate. SPLIT-SEQUENCE is much more
> descriptive of what the function does.

I suppose this depends if you're a physicist or a set theorist; to a
physicist (me, for example) partition has connotation of putting
partitions into something, to divide it up;

I freely give permission to vendors to include the partition code in
their Lisps; if vendors think that it will help, they are free to call
it 'SPLIT-SEQUENCE' if they like, or 'SPLIT', or whatever. Not that I
generally believe in appeals to the market to determine correctness,
but in this case it's my way of dodging the issue. I *like* the name
'PARTITION', so that's what I call it; others are free to do
otherwise, though as a matter of unifying the community I would rather
hope that they didn't. Ultimately, I accept the possibility that I
will be in a minority of one.

Anyone else want to volunteer ideas for utility functions that
everyone writes? Imagine that CL had a 'PARTITION' in the language;
what would people lament the absence of to comp.lang.lisp once a week?

Tim Moore

unread,
Oct 12, 2001, 10:57:41 AM10/12/01
to
In article <y6citdl...@octagon.mrl.nyu.edu>, "Marco Antoniotti"
<mar...@cs.nyu.edu> wrote:


> Christophe Rhodes <cs...@cam.ac.uk> writes:

>> See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
>> community-discussed function is described in roughly
>> specification-level detail, with links to a reference implementation.
> I am sorry to be sooo nagging (again) on such a stupid matter. But......
> The name PARTITION is inappropriate. SPLIT-SEQUENCE is much more
> descriptive of what the function does. Cheers
>

Get over it!

Tim

Add smileys as necessary

Russell Senior

unread,
Oct 12, 2001, 4:26:30 PM10/12/01
to
>>>>> "Wade" == Wade Humeniuk <hume...@cadvision.com> writes:

Wade> I usually do this type of thing with

Wade> (defun read-delimited-string (string &optional (delimiter #\.))
Wade> "Returns a read list of delimited values from a string"
Wade> (read-from-string
Wade> (concatenate 'string "("
Wade> (substitute #\space delimiter string)
Wade> ")")))
Wade>
Wade> CL-USER 3 > (read-delimited-string "210.23.138.16")
Wade> (210 23 138 16)
Wade> 15

This, of course, won't work the way you want if the delimited values
also contain spaces.

I've been using a split-sequence function that was discussed here on
comp.lang.lisp back in Sept 1998, which works reasonably well. The
problem above, though, raises the question of how one might handle
quoting of delimiters. It hasn't been a problem for me, as usually
things are arranged so that it won't be, but in the general case it
could.


--
Russell Senior ``The two chiefs turned to each other.
sen...@aracnet.com Bellison uncorked a flood of horrible
profanity, which, translated meant, `This is
extremely unusual.' ''

Shannon Spires

unread,
Oct 12, 2001, 5:52:04 PM10/12/01
to
In article <873d4pj...@inanna.rimspace.net>, Daniel Pittman
<dan...@rimspace.net> wrote:

> Oh, and am I making a really silly mistake storing an IP address in a
> slot of ":type (vector (integer 0 255) 4)"?

I usually store them as 32-bit integers. It's simple that way, and
my TCP/IP stack routines use integers internally anyway. Provided you
have good conversion routines to and from dotted notation for human I/O,
it works well.

Shannon Spires
svs...@nmia.com

Pierre R. Mai

unread,
Oct 12, 2001, 6:19:52 PM10/12/01
to
Russell Senior <sen...@aracnet.com> writes:

> I've been using a split-sequence function that was discussed here on
> comp.lang.lisp back in Sept 1998, which works reasonably well. The
> problem above, though, raises the question of how one might handle
> quoting of delimiters. It hasn't been a problem for me, as usually
> things are arranged so that it won't be, but in the general case it
> could.

Once you throw escaping, or similar things into the equation, IMHO the
time has come to write a lexer/parser. This is often only slightly
more complex than calling split-sequence/partition/what-have-you, but
offers you much more flexibility, and IMHO clarity.

Regs, Pierre.

--
Pierre R. Mai <pm...@acm.org> http://www.pmsf.de/pmai/
The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We cause accidents. -- Nathaniel Borenstein

Wade Humeniuk

unread,
Oct 12, 2001, 7:09:23 PM10/12/01
to
> This, of course, won't work the way you want if the delimited values
> also contain spaces.
>

Of course not, but it does not have to in the case of dotted IP addresses.

This raises the issue of a generalized parser/reader for any conceivable
situation or writing a special purpose reader for specific cases. The time
needed to implement a generalized solution (like regular expressions)
outweighs the time to implement a 100 of the specific readers. Lazy man's
way out.

Here is a snippet of a parsing/reading problem from the LWW port for Aserve.
Delimiters are slightly more complex.

;;;
;;; DATE-TO-UNIVERSAL-TIME
;;; This is a contribution of Wade Humeniuk <hume...@cadvision.com>
;;; It reimplements the original function without MATCH-REGEXP
;;; which is not fully implemented in ACL-COMPAT
;;;

(defvar *net.aserve-package* (find-package :net.aserve))

(defun date-to-universal-time (date)
;; convert a date string to lisp's universal time
;; we accept all 3 possible date formats

;; check preferred type first (rfc1123 (formerly refc822)):
;; Sun, 06 Nov 1994 08:49:37 GMT
;; now second best format (but used by Netscape sadly):
;; Sunday, 06-Nov-94 08:49:37 GMT
;; finally the third format, from unix's asctime
;; Sun Nov 6 08:49:37 1994

(let ((date (copy-seq date))
(*read-eval* nil)
(*package* *net.aserve-package*))
(loop for char across date
for i = 0 then (1+ i)
when (or (char= #\, char)
(char= #\- char)
(char= #\: char))
do (setf (elt date i) #\space))
(setf date (concatenate 'string "(" date ")"))

(destructuring-bind (day-of-week day month year hour minute second
&optional timezone)
(read-from-string date)
(declare (ignore day-of-week timezone))
(when (symbolp day) ;; probably third format, swap values
(let ((real-day month)
(real-month day)
(real-hour year)
(real-minute hour)
(real-second minute)
(real-year second))
(setf day real-day
month real-month
year real-year
hour real-hour
minute real-minute
second real-second)))
(setf month (ecase month
(jan 1)
(feb 2)
(mar 3)
(apr 4)
(may 5)
(jun 6)
(jul 7)
(aug 8)
(sep 9)
(oct 10)
(nov 11)
(dec 12)))
(cond
((and (> year 70) (< year 100)) (incf year 1900))
((<= year 70) (incf year 2000)))
(encode-universal-time second minute hour day month year))))

#| The original code
(defun date-to-universal-time (date)
;; convert a date string to lisp's universal time
;; we accept all 3 possible date formats

(flet ((cvt (str start-end)
(let ((res 0))
(do ((i (car start-end) (1+ i))
(end (cdr start-end)))
((>= i end) res)
(setq res
(+ (* 10 res)
(- (char-code (schar str i)) #.(char-code #\0))))))))
;; check preferred type first (rfc1123 (formerly refc822)):
;; Sun, 06 Nov 1994 08:49:37 GMT
(multiple-value-bind (ok whole
day
month
year
hour
minute
second)
(match-regexp
"[A-Za-z]+, \\([0-9]+\\) \\([A-Za-z]+\\) \\([0-9]+\\)
\\([0-9]+\\):\\([0-9]+\\):\\([0-9]+\\) GMT"
date
:return :index)
(declare (ignore whole))
(if* ok
then (return-from date-to-universal-time
(encode-universal-time
(cvt date second)
(cvt date minute)
(cvt date hour)
(cvt date day)
(compute-month date (car month))
(cvt date year)
0))))

;; now second best format (but used by Netscape sadly):
;; Sunday, 06-Nov-94 08:49:37 GMT
;;
(multiple-value-bind (ok whole
day
month
year
hour
minute
second)
(match-regexp

"[A-Za-z]+, \\([0-9]+\\)-\\([A-Za-z]+\\)-\\([0-9]+\\)
\\([0-9]+\\):\\([0-9]+\\):\\([0-9]+\\) GMT"
date
:return :index)

(declare (ignore whole))

(if* ok
then (return-from date-to-universal-time
(encode-universal-time
(cvt date second)
(cvt date minute)
(cvt date hour)
(cvt date day)
(compute-month date (car month))
(cvt date year) ; cl does right thing with 2 digit dates
0))))


;; finally the third format, from unix's asctime
;; Sun Nov 6 08:49:37 1994
(multiple-value-bind (ok whole
month
day
hour
minute
second
year
)
(match-regexp

"[A-Za-z]+ \\([A-Za-z]+\\) +\\([0-9]+\\)
\\([0-9]+\\):\\([0-9]+\\):\\([0-9]+\\) \\([0-9]+\\)"
date
:return :index)

(declare (ignore whole))

(if* ok
then (return-from date-to-universal-time
(encode-universal-time
(cvt date second)
(cvt date minute)
(cvt date hour)
(cvt date day)
(compute-month date (car month))
(cvt date year)
0))))


))
|#


Wade

Erik Naggum

unread,
Oct 12, 2001, 7:15:29 PM10/12/01
to
* Christophe Rhodes

| See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
| community-discussed function is described in roughly specification-level
| detail, with links to a reference implementation.

* Marco Antoniotti


| I am sorry to be sooo nagging (again) on such a stupid matter. But......
| The name PARTITION is inappropriate. SPLIT-SEQUENCE is much more
| descriptive of what the function does.

* Tim Moore
| Get over it!

But "partition" is such a _fantastically_ bad name, especially to people
who know a bit of mathematical terminology. Effectively using up that
name forever for something so totally unrelated to the mathematical
concept is hostile. It is like defining a programming language where
"sin" and "tan" are operations on (in) massage parlor just because the
designers are more familiar with them than with mathematics. "Partition"
is a good name for a string-related function when the _only_ thing you
think about is strings, or sequences at best. At the very least, it
should be called partition-sequence, but even this sounds wrong to me.

I tend to use :start and :end arguments to various functions instead of
splitting one string into several, and make sure that functions I write
accept :start and :end arguments, and that they work with all sequences
and useful element types, not only strings and characters.

///

Kenny Tilton

unread,
Oct 12, 2001, 10:11:00 PM10/12/01
to

Erik Naggum wrote:
>
> * Christophe Rhodes
> | See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
> | community-discussed function is described in roughly specification-level
> | detail, with links to a reference implementation.
>
> * Marco Antoniotti
> | I am sorry to be sooo nagging (again) on such a stupid matter. But......
> | The name PARTITION is inappropriate. SPLIT-SEQUENCE is much more
> | descriptive of what the function does.
>
> * Tim Moore
> | Get over it!
>
> But "partition" is such a _fantastically_ bad name, especially to people
> who know a bit of mathematical terminology.

hmmm. my dictionary says partition means to divide into parts. if
partition means something else to mathematicians, that's fine, natural
language is like that, but it's a bit harsh to moan about someone using
a word correctly just because someone else took liberties with it.

besides, in a custody fight between mathematics and sequences over the
symbol-function of 'partition, well this is Lisp, I think sequences win.
we could solomon-like split the baby in half and not let anyone use
'partition, but consider this: the only sequence function I see listed
in CLTL2 which does not take a generic name (such as 'position) all for
itself is the trivial case of 'copy-seq.

i think the math literates amongst us gots to remember whose house they
are in when reading Lisp. (y'all can grok (+ 2 2) right?). seems to me
unadorned function names go to sequences, and it is the guest domains
that need to tack on tie-break syllables.

kenny
clinisys

Jochen Schmidt

unread,
Oct 12, 2001, 10:32:50 PM10/12/01
to
Wade Humeniuk wrote:

>> This, of course, won't work the way you want if the delimited values
>> also contain spaces.
>>
>
> Of course not, but it does not have to in the case of dotted IP addresses.
>
> This raises the issue of a generalized parser/reader for any conceivable
> situation or writing a special purpose reader for specific cases. The
> time needed to implement a generalized solution (like regular expressions)
> outweighs the time to implement a 100 of the specific readers. Lazy man's
> way out.
>
> Here is a snippet of a parsing/reading problem from the LWW port for
> Aserve. Delimiters are slightly more complex.

[example snipped]

This parsing routine got replaced a while ago through a function using the
META Parser. The problem with using the READER for that stuff was that some
Browsers (Netscape) had a semicolon and some further characters behind the
date and if you wrap that string in parens, the closing paren is behind the
semicolon and therefore commented out.

The actual code in portableaserve is like this (which is a quick hack
written with META and not really nice...)

(eval-when (:compile-toplevel :load-toplevel :execute)
(meta:enable-meta-syntax)
(deftype alpha-char () '(and character (satisfies alpha-char-p)))
(deftype digit-char () '(and character (satisfies digit-char-p)))
)

(defun date-to-universal-time (date)
;; convert a date string to lisp's universal time
;; we accept all 3 possible date formats

;; check preferred type first (rfc1123 (formerly refc822)):
;; Sun, 06 Nov 1994 08:49:37 GMT
;; now second best format (but used by Netscape sadly):
;; Sunday, 06-Nov-94 08:49:37 GMT
;; finally the third format, from unix's asctime
;; Sun Nov 6 08:49:37 1994

(let (last-result)
(meta:with-string-meta (buffer date)
(labels ((make-result ()
(make-array 0
:element-type 'base-char
:fill-pointer 0 :adjustable t))
(skip-day-of-week (&aux c)
(meta:match [$[@(alpha-char c)]
!(skip-delimiters)]))
(skip-delimiters ()
(meta:match $[{#\: #\, #\space #\-}]))
(word (&aux (old-index meta::index) c
(result (make-result)))
(or (meta:match [!(skip-delimiters) @(alpha-char c)
!(vector-push-extend c result)
$[@(alpha-char c)
!(vector-push-extend c result)]
!(setf last-result result)])
(progn (setf meta::index old-index) nil)))
(integer (&aux (old-index meta::index) c
(result (make-result)))
(or (meta:match [!(skip-delimiters) @(digit-char c)
!(vector-push-extend c result)
$[@(digit-char c)
!(vector-push-extend c result)]
!(setf last-result
(parse-integer result))])
(progn (setf meta::index old-index) nil)))
(date (&aux day month year hours minutes seconds)
(and (meta:match [!(skip-day-of-week)
{[!(word) !(setf month last-result)
!(integer) !(setf day last-result)]
[!(integer) !(setf day last-result)
!(word) !(setf month
last-result)]}
!(integer) !(setf year last-result)
!(integer) !(setf hours last-result)
!(integer) !(setf minutes
last-result)
!(integer) !(setf seconds
last-result)])
; (values seconds minutes hours day month)
(encode-universal-time seconds minutes hours day
(net.aserve::compute-month
(coerce month 'simple-string)
0)
year
0))))
(date)))))


ciao,
Jochen

--
http://www.dataheaven.de

Wade Humeniuk

unread,
Oct 13, 2001, 12:29:06 AM10/13/01
to
> This parsing routine got replaced a while ago through a function using the
> META Parser. The problem with using the READER for that stuff was that
some
> Browsers (Netscape) had a semicolon and some further characters behind the
> date and if you wrap that string in parens, the closing paren is behind
the
> semicolon and therefore commented out.
>

Would it have worked to have substituted the #\; for #\space first? Same
kind of routine but discarding the extra vars in destructuring-bind?

> The actual code in portableaserve is like this (which is a quick hack
> written with META and not really nice...)
>

Wow, I would not have thought a macro like meta existed.

Wade

Bulent Murtezaoglu

unread,
Oct 13, 2001, 2:37:36 AM10/13/01
to
>>>>> "KT" == Kenny Tilton <kti...@nyc.rr.com> writes:
[...]
KT> hmmm. my dictionary says partition means to divide into
KT> parts. if partition means something else to mathematicians,
KT> that's fine, natural language is like that, but it's a bit
KT> harsh to moan about someone using a word correctly just
KT> because someone else took liberties with it. [...]

Unfortunately it also means something to computer scientists, possibly
the same thing it means to mathematicians (what an equivalence relation
does to a set) so the overlap is not just with some remote mathematical
lingo. When you say partition, a CS type would think of sets, not strings.
I therefore don't think Erik was being unduly harsh.

Of course I was too distracted/lazy to say any of this and even read cll
when what became partition was being discussed, so I should probably shut
up now.

cheers,

BM

Kenny Tilton

unread,
Oct 13, 2001, 4:49:52 AM10/13/01
to
Bulent Murtezaoglu wrote:

> When you say partition, a CS type would think of sets, not strings.
> I therefore don't think Erik was being unduly harsh.

<h> actually i was mimicking the teenager usage of "harsh", which usage
is highly exaggerated as with most teenspeak. and i was thinking of the
general case of objecting to someone using a word correctly, wasn't
thinking about EN's post at all at that point tho I can see why one
would construe it that way.

actually we used "partition" in our code recently in the sense you
described, in a partial DB replication scheme: a DB instance viewed as
partitioning the set of all DB instances according to whether the key
instance had a direct or indirect owning relationship of any given
instance.

that said, turning from my dictionary to my thesaurus I discover split
and partition listed together under "allocation". :(

do i hear you all saying that the objection is that this string
manipulation we are discussing takes an ordered sequence and chops it up
by finding certain delimiters and then crucially considering the order
when dividing up the string, ie, every element _between_ two delimiters
ends up in the same partition, whereas in partitioning order does not
matter, each set member gets tested individually with the predicate? if
so, ok, i get that distinction.

sadly, i just looked up "split" and though the definition sounded as if
order was a factor in the partitioning denoted by "split", the two
examples given were "split into groups" and "split up the money". :(

interesting, what synonym for partition implies order matters? i guess
"subseq" kinda hits the problem over the head (just checked, that was
omitted from the list of sequence functions I saw in CLTL2) so with that
precedent something like 'split-sequence or 'splitseq would indeed be
preferable.

kenny
clinisys


>
> >>>>> "KT" == Kenny Tilton <kti...@nyc.rr.com> writes:
> [...]
> KT> hmmm. my dictionary says partition means to divide into
> KT> parts. if partition means something else to mathematicians,
> KT> that's fine, natural language is like that, but it's a bit
> KT> harsh to moan about someone using a word correctly just
> KT> because someone else took liberties with it. [...]
>
> Unfortunately it also means something to computer scientists, possibly
> the same thing it means to mathematicians (what an equivalence relation
> does to a set) so the overlap is not just with some remote mathematical
>

Erik Naggum

unread,
Oct 13, 2001, 5:27:43 AM10/13/01
to
* Kenny Tilton

| hmmm. my dictionary says partition means to divide into parts. if
| partition means something else to mathematicians, that's fine, natural
| language is like that, but it's a bit harsh to moan about someone using
| a word correctly just because someone else took liberties with it.

To repeat myself from the article you responded to, since a teenager's
attention span is so short:

At the very least, it should be called partition-sequence, but even this
sounds wrong to me.

The more general a name, the more general the functionality it should
provide in order to defend usurping the general name. If it only works
on sequences and only uses _one_ meaning of a word at the exclusion of
another, make it more specific. I posted the first version of the code
that got discussed and transmogrified and then renamed into "partition"
without any discussion here. It was called "split-sequence" as I recall.
The code that they base "partition" on was initially called just "split"
and renamed "partition". Bad move.

Common Lisp does not have a simple way to import a symbol from a package
under another name. This means the connection to a badly chosen name is
broken if you choose to rename it. This is all the more reason to be a
little careful when you name things very generally. "split" was horrible
in that sense, too. I notice in passing that Franz Inc's "aserve" has
split-on-character, split-into-words, and split-string functions which
all seem overly specific, but which are at leas properly named.

///

Christophe Rhodes

unread,
Oct 13, 2001, 5:45:23 AM10/13/01
to
Erik Naggum <er...@naggum.net> writes:

> * Christophe Rhodes
> | See <URL:http://ww.telent.net/cliki/PARTITION>, wherein a
> | community-discussed function is described in roughly specification-level
> | detail, with links to a reference implementation.
>
> * Marco Antoniotti
> | I am sorry to be sooo nagging (again) on such a stupid matter. But......
> | The name PARTITION is inappropriate. SPLIT-SEQUENCE is much more
> | descriptive of what the function does.
>
> * Tim Moore
> | Get over it!
>
> But "partition" is such a _fantastically_ bad name, especially to people
> who know a bit of mathematical terminology.

I can't help but be slightly irritated by this, I'm afraid, as I noted
at the time the conspicuous absence of certain people (not just Erik)
in the debate about the splitting function and its naming, at times
when I thought they might well have something to contribute.

Nevertheless, the question is probably more "so what are we going to
do about it?" Well, that's a good question... my personal attitude at
this point right now is "why bother?"

No doubt my idealism will resurface at some point,

Russell Senior

unread,
Oct 13, 2001, 6:54:13 AM10/13/01
to
>>>>> "Erik" == Erik Naggum <er...@naggum.net> writes:

Erik> [...] I posted the first version of the code that got discussed
Erik> and transmogrified and then renamed into "partition" without any
Erik> discussion here. It was called "split-sequence" as I recall.

I think I might have been the one to call it split-sequence. This
function was discussed on this newsgroup in September 1998, initially
in a thread titled "I don't understand Lisp". During a discussion of
regular expressions (I think it was) Erik posted a function with a
slightly different interface and purpose called delimited-substrings,
and I followed up with one (pretty horrifying, but functioning) called
split-sequence, which I had adapted/generalized from one I'd found
called split-string. Over the next few days it was substantially
revised/rewritten several times on the newsgroup by various authors.
At the end of that thread, it was still being called split-sequence,
which I continue to like and still use.

It appears this is what resurfaced in a still mutating form about a
year ago, called variously split and partition.

When the Christophe Rhodes "split-sequence/partition" thread started
back in June/July, I wasn't paying very much attention and so I didn't
participate.

BTW, one useful feature that got lost along the way seems to be the
ability to provide a value for empty subsequences.

For what it's worth, I still like the name split-sequence.

Erik Naggum

unread,
Oct 13, 2001, 7:16:32 AM10/13/01
to
* Christophe Rhodes <cs...@cam.ac.uk>

| I can't help but be slightly irritated by this, I'm afraid, as I noted
| at the time the conspicuous absence of certain people (not just Erik)
| in the debate about the splitting function and its naming, at times
| when I thought they might well have something to contribute.

Where did this debate occur? I have just stuffed a private archive of a
_lot_ of news into a huge database, and cannot find any discussion of the
name "partition" in this forum. If you go away and make up your own
community and you do something stupid and somebody complains about it, it
is fairly bad taste to blame the people _you_ left behind for not taking
part in your discussion. This is one of the reasons I do not think those
mini-communities are doing any good. You need a large number of people
to weed out the silly ideas that look good to everyone in a small group.

| Nevertheless, the question is probably more "so what are we going to do
| about it?" Well, that's a good question... my personal attitude at this
| point right now is "why bother?"

Yeah, why use something that is so badly named? So, who cares?

As I have indicated, I think splitting strings and creating huge amounts
of garbage during parsing is bad software design. The incessant copying
of characters that plague most parsers is _the_ source of bad performance.

///

Erik Naggum

unread,
Oct 13, 2001, 7:40:35 AM10/13/01
to
* Russell Senior <sen...@aracnet.com>

| I think I might have been the one to call it split-sequence.

Yes. Thank you for the correction and clarification.

| When the Christophe Rhodes "split-sequence/partition" thread started back
| in June/July, I wasn't paying very much attention and so I didn't
| participate.

It looked to me like nobody really liked "partition" and the consensus
was clearly on "split-sequence". The name "partition" was just handed to
us as something to accept despite the strong opposition. However, I have
not found the discussion behind this comment in "partition.lisp":

;;; * naming the function PARTITION rather than SPLIT.

I wonder how this change was chosen. Where can I find the discussion?

///

Christophe Rhodes

unread,
Oct 13, 2001, 9:31:06 AM10/13/01
to
Erik Naggum <er...@naggum.net> writes:

> * Christophe Rhodes <cs...@cam.ac.uk>
> | I can't help but be slightly irritated by this, I'm afraid, as I noted
> | at the time the conspicuous absence of certain people (not just Erik)
> | in the debate about the splitting function and its naming, at times
> | when I thought they might well have something to contribute.
>
> Where did this debate occur? I have just stuffed a private archive of a
> _lot_ of news into a huge database, and cannot find any discussion of the
> name "partition" in this forum.

A quick google gets me

<URL:http://groups.google.com/groups?q=group:comp.lang.lisp+partition+split-sequence&hl=en&rnum=3&selm=y6clmm3lajm.fsf%40octagon.mrl.nyu.edu>

for instance; there's a thread of 35 articles, according to google.

> If you go away and make up your own
> community and you do something stupid and somebody complains about it, it
> is fairly bad taste to blame the people _you_ left behind for not taking
> part in your discussion.

Granted; I thought that comp.lang.lisp was the closest we had to a
lisp community these days. If there's somewhere else that I should
have been writing, let me know, please!

> This is one of the reasons I do not think those
> mini-communities are doing any good. You need a large number of people
> to weed out the silly ideas that look good to everyone in a small group.

True.

> | Nevertheless, the question is probably more "so what are we going to do
> | about it?" Well, that's a good question... my personal attitude at this
> | point right now is "why bother?"
>
> Yeah, why use something that is so badly named? So, who cares?

Now, if on reading the thread one article of which is cited above, you
observe that I ignored the Kassandras telling me that "partition" is a
bad name, you might be more justified. Since I sincerely doubt (though
do disabuse me if this isn't true) that any vendor has yet adopted
partition, the cost to the community, wherever it resides, in changing
the specification to the extent of the name (to split-sequence,
cleave, or any of the other names discussed in that thread; see for
example
<URL:http://groups.google.com/groups?q=g:thl1983237200d&hl=en&selm=87ofr2hqdq.fsf%40palomba.bananos.org>)
is minimal[*]. Given this, it's not a problem.

> As I have indicated, I think splitting strings and creating huge amounts
> of garbage during parsing is bad software design. The incessant copying
> of characters that plague most parsers is _the_ source of bad performance.

And this is another matter entirely. Nevertheless, given the frequency
of requests in this forum for "a string splitting function" it might
be useful to have something that was designed rather than 5 ad-hoc
security-flawed answers for each occasion.

Cheers,

Christophe

[*] It might cause me to persuade Dan Barlow to implement topic
synonyms for CLiki.

Erik Naggum

unread,
Oct 13, 2001, 1:02:24 PM10/13/01
to
* Christophe Rhodes

| A quick google gets me
|
| <URL:http://groups.google.com/groups?q=group:comp.lang.lisp+partition+split-sequence&hl=en&rnum=3&selm=y6clmm3lajm.fsf%40octagon.mrl.nyu.edu>
|
| for instance; there's a thread of 35 articles, according to google.

No, this is not discussing the transition from "split" _to_ "partition".
Specifically, no articles even attempts to explain how "partition" was
chosen and why it is a good name. The overwhelmingly negative response
to that name when you _did_ publish it was just ignored, as you admit. I
wonder how you can complain about people not raising their concerns when
you just walked away when they did.

| Since I sincerely doubt (though do disabuse me if this isn't true) that
| any vendor has yet adopted partition, the cost to the community, wherever
| it resides, in changing the specification to the extent of the name

| ([...]) is minimal[*]. Given this, it's not a problem.

It is and remains a problem if it is not _actually_ done.

* Erik Naggum


| As I have indicated, I think splitting strings and creating huge amounts
| of garbage during parsing is bad software design. The incessant copying
| of characters that plague most parsers is _the_ source of bad performance.

* Christophe Rhodes


| And this is another matter entirely.

Well, some of us think that if people ask for tail recursion, they should
be told about other iteration constructs. It is downright sad that as
Common Lisp is such a great language for its ability to maintain identity
of objects and therefore was inherently "object-oriented" before anyone
invented that term, has succumbed to the very primitive properties to C
and Unix tools where copying characters around all the time is _not_ seen
as pretty damn stupid, which it is. Strings are fairly expensive objects
in Common Lisp -- they actually are in any language -- but copying text
is a more expensive operation in Common Lisp than in languages that do it
so often they have super-optimized copying functions. This is even more
true when the Common Lisp system uses Unicode internally and talks to a
world that still uses 7- or 8-bit-encoded character sets.

| Nevertheless, given the frequency of requests in this forum for "a string
| splitting function" it might be useful to have something that was
| designed rather than 5 ad-hoc security-flawed answers for each occasion.

Giving people what they want when they express their desire in the form
of an implementation of a solution they could not write on their own is
never going to help them. People who ask such questions need to be told
that they have to present their problem and not the solution they have
chosen in their _ignorance_ of the solution space.

But from where did "security-flawed" enter the picture? I sense another
matter entirely. :)

///
--
The United Nations before and after the leadership of Kofi Annan are two
very different organizations. The "before" United Nations did not deserve
much credit and certainly not a Nobel peace prize. The "after" United
Nations equally certainly does. I applaud the Nobel committee's choice.

Christophe Rhodes

unread,
Oct 13, 2001, 4:00:44 PM10/13/01
to
Erik Naggum <er...@naggum.net> writes:

> * Christophe Rhodes
> | A quick google gets me
> |
> | <URL:http://groups.google.com/groups?q=group:comp.lang.lisp+partition+split-sequence&hl=en&rnum=3&selm=y6clmm3lajm.fsf%40octagon.mrl.nyu.edu>
> |
> | for instance; there's a thread of 35 articles, according to google.
>
> No, this is not discussing the transition from "split" _to_ "partition".
> Specifically, no articles even attempts to explain how "partition" was
> chosen and why it is a good name.

Hmm. I'm afraid in that case I can't refer you to the recent thread.
However, and without wishing to hide behind other people as I confess
to liking the name, please see

<URL:http://groups.google.com/groups?hl=en&rnum=5&selm=878zrlp1cr.fsf%40orion.bln.pmsf.de>

which happened to be my starting point. Call it historical accident,
if you will; I certainly didn't mean to hide this. On rereading the
2001 thread, I agree that this wasn't made plain.

> The overwhelmingly negative response
> to that name when you _did_ publish it was just ignored, as you admit. I
> wonder how you can complain about people not raising their concerns when
> you just walked away when they did.

Some liked it, maybe most didn't; there was no consensus that I could
see on a preferred alternative; and there was agreement (or at least
nem con) when it was said:

Pierre Mai:
> In any case I agree that as long
> as the name is not unduly ugly, this is probably the least problem
> hindering acceptance.

It would appear that this isn't true. It would have been nice to know
that there was strong feeling from several sources.

> | Since I sincerely doubt (though do disabuse me if this isn't true) that
> | any vendor has yet adopted partition, the cost to the community, wherever
> | it resides, in changing the specification to the extent of the name
> | ([...]) is minimal[*]. Given this, it's not a problem.
>
> It is and remains a problem if it is not _actually_ done.

A community-based standard is obviously only as strong as the
community's use for it. However, my impression was in fact that few
people were at all interested, few were paying attention and few
actually cared. I'm unfairly maligning people, but this was my
_impression_.

There is no defined mechanism for finalizing or changing
specifications of this kind; also, the best-formatted and
most-easily-accessible version of the specification is on a
world-writeable web page. Maybe we need a more formal structure for
these things (à la SRFI)? Discussion period, followed by vote? I don't
know. Since I seem still to be at the focus, I'm willing, I suppose,
to tally votes, or something.

I mean, I don't know. Fortunately, should the consensus be for a name
change, we've recently discussed here ways of deprecating
interfaces... :-)

> But from where did "security-flawed" enter the picture? I sense another
> matter entirely. :)

Oh, that was a reference to the read-based solutions to this problem
that appear with appalling regularity. :)

Christophe

PS: I would like to say, since I've invoked Pierre's posts twice, just
in case that it isn't clear: I am speaking for myself only in this
message. There is no cabal.

Thomas F. Burdick

unread,
Oct 14, 2001, 8:44:38 PM10/14/01
to
Daniel Pittman <dan...@rimspace.net> writes:

> I am looking for the simplest way to split a string into four strings
> based on a character -- to parse an IP address string, specifically.
>

> What is the best, easiest, fastest, etc, way to split a string into
> substrings based on a character position. In Emacs Lisp I would just:
>
> (let ((address "210.23.138.16"))
> (split-string address "\\.")) ; second arg is regexp to split on.
>
> Now, I don't actually need regexp functionality here; a literal '.' is
> enough for me.
>

> This strikes me as the sort of idiom that would be common enough for
> Common Lisp[1] to feature it as part of the standard.

Except that, as I'm sure you've seen by now, it's a source of
contention as to how exactly this should be done. I'd had a
SPLIT-VECTOR function that I used to use:

[57]> (split-vector " la dee dah " #\space :start 1)
("la" "dee" "dah" "")

But I didn't like all the pointless consing. So I'd been doing it by
hand with LOOP and reusing the string. Then, I felt stupid for using
an idiom that I could turn into a more concise macro. So I came up
with DO-VECTOR-SPLIT. I don't really like the name, but it does act
like a DO-... macro.

(defun call-splitting-vector (vector splitter fn
&key (start 0)
(end (length vector))
(test #'eq))
(when (< start 0)
(error ":START should be <= 0, not ~S" start))
(when (> end (length vector))
(error ":END is out of bounds"))
(loop with begin = start
for i from start below end
when (funcall test splitter (aref vector i))
do (funcall fn begin i)
(setf begin (1+ i))
finally (funcall fn begin i)))

(defmacro do-vector-split ((start end (vector splitter)
&rest keys &key &allow-other-keys)
&body forms)
(when (null start) (setf start (gensym)))
(when (null end) (setf end (gensym)))
`(call-splitting-vector ,vector ,splitter
#'(lambda (,start ,end) ,@forms)
,@keys))

You can use DO-VECTOR-SPLIT to collect a list of substrings:

[58]> (let ((result ())
(string " la dee dah "))
(do-vector-split (s e (string #\space)
:start 1)
(push (subseq string s e) result))
(nreverse result))
("la" "dee" "dah" "")

Of course, you can always define SPLIT-VECTOR in terms of
CALL-SPLITTING-VECTOR:

(defun split-vector (vector splitter &rest keys &key &allow-other-keys)
(let ((result ()))
(apply #'call-splitting-vector
vector splitter #'(lambda (s e)
(push (subseq vector s e) result))
keys)
(nreverse result)))

But you'll probably only very rarely need it, because you're probably
splitting the string as an intermediate step to some other end
(stuffing numbers into a vector that represents an IP address, for
example), so you may as well avoid consing up new strings and a new
list just to throw them away.

G. W. Puckett

unread,
Oct 15, 2001, 10:29:53 AM10/15/01
to

t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> (when (< start 0)
> (error ":START should be <= 0, not ~S" start))

Is this error message incorrect?


--
I. M. Puckett replace "sendnospam" with "puckett"

Thomas F. Burdick

unread,
Oct 15, 2001, 3:40:24 PM10/15/01
to
sendn...@nortelnetworks.com (G. W. Puckett) writes:

> t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> > (when (< start 0)
> > (error ":START should be <= 0, not ~S" start))
>
> Is this error message incorrect?

Oops, I've even gotten that error before, but since I wrote the
message, I knew what I meant:

Rob Warnock

unread,
Oct 29, 2001, 10:11:36 PM10/29/01
to
Erik Naggum <er...@naggum.net> wrote:
+---------------

| ;;; * naming the function PARTITION rather than SPLIT.
| I wonder how this change was chosen. Where can I find the discussion?
+---------------

I only saved a few of those, but here are some snippets from near the
end of the thread ("Subject: Re: (final?) PARTITION specification").
Hopefully the Message-IDs will help you find them::

Date: 05 Jul 2001 10:23:25 -0400
From: Marco Antoniotti <mar...@cs.nyu.edu>
Message-ID: <y6clmm3...@octagon.mrl.nyu.edu>
...
I would use SPLIT-SEQUENCE and SPLIT-SEQUENCE-IF. In this way
it is clear that these functions work on any sequence.
====
Date: 09 Jul 2001 23:46:56 +0100
From: Christophe Rhodes <cs...@cam.ac.uk>
Message-ID: <sq66d12...@lambda.jesus.cam.ac.uk>
...
I remain unconvinced by the legion clamouring for a name change from
partition, to be honest. I think that anything I choose will either
clash with something else or be hideously ugly (or both, of course);
so I'm going to stick to my guns and go with PARTITION. Sorry if that
makes the code or the specification unuseable by anyone.
====
Date: 10 Jul 2001 14:01:07 +0200
Subject: Re: (final?) PARTITION specification
Message-ID: <87bsmt9...@orion.bln.pmsf.de>
...
In any case, while I'm the original proponent of sticking
to PARTITION, I'd like to add that I could also live with
SPLIT-SEQUENCE or maybe SPLIT-SEQ, if it mattered.

The general sense I got was that a *lot* of people were initially for
SPLIT, but then someone mentioned a conflict with the series package,
so most shifted to SPLIT-SEQUENCE, with decreasing support for PARTITION
as time wore on... except for Christophe. [Apologies if I've severly
mis-stated anything.]


-Rob

-----
Rob Warnock, 30-3-510 <rp...@sgi.com>
SGI Network Engineering <http://www.meer.net/~rpw3/>
1600 Amphitheatre Pkwy. Phone: 650-933-1673
Mountain View, CA 94043 PP-ASEL-IA

[Note: aaan...@sgi.com and zedw...@sgi.com aren't for humans ]

0 new messages