Splitting a string on a character...

Cory Spencer

unread,

May 6, 2002, 2:53:13 PM5/6/02

to

Just a quickie question - is there already a Common Lisp function that
will split a string on a given character?

ie) will perform a similar function as this:

(defun split (chr str)
(let ((pos (position chr str)))
(if (null pos)
str
(cons (substring str 0 pos)
(split chr (substring str (+ pos 1)))))))

--
Cory Spencer <cspe...@interchange.ubc.ca>

Barry Margolin

unread,

May 6, 2002, 3:02:55 PM5/6/02

to

In article <ab6jeo$vs$1...@nntp.itservices.ubc.ca>,

Cory Spencer <cspe...@interchange.ubc.ca> wrote:
>Just a quickie question - is there already a Common Lisp function that
>will split a string on a given character?

No.

>ie) will perform a similar function as this:
>
> (defun split (chr str)
> (let ((pos (position chr str)))
> (if (null pos)
> str
> (cons (substring str 0 pos)
> (split chr (substring str (+ pos 1)))))))

You should probably return (list str) in the terminating case, so that the
final result is a proper list rather than a dotted list.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Edi Weitz

unread,

May 6, 2002, 3:10:45 PM5/6/02

to

Cory Spencer <cspe...@interchange.ubc.ca> writes:

> Just a quickie question - is there already a Common Lisp function that
> will split a string on a given character?
>
> ie) will perform a similar function as this:
>
> (defun split (chr str)
> (let ((pos (position chr str)))
> (if (null pos)
> str
> (cons (substring str 0 pos)
> (split chr (substring str (+ pos 1)))))))

No, but you might want to take a look at

<http://ww.telent.net/cliki/SPLIT-SEQUENCE>

Also, some regex packages provide string splitters that can split on
arbitrary regular expressions. See

<http://www.ccs.neu.edu/home/dorai/pregexp/pregexp-Z-H-2.html#%_sec_2.4>

for an example.

Edi.

Thomas F. Burdick

unread,

May 6, 2002, 3:25:10 PM5/6/02

to

Cory Spencer <cspe...@interchange.ubc.ca> writes:

> Just a quickie question - is there already a Common Lisp function that
> will split a string on a given character?

No. Personally, I'm glad. All the functions that deal with sequences
let you specify :start and :end, and we have POSITION. Which all add
up to a more reasonable, less garbage-y way of doing things. You can
see more of my thoughts on the matter here:

<http://groups.google.com/groups?q=g:thl963481286d&dq=&hl=en&selm=xcvvghhpw95.fsf%40conquest.OCF.Berkeley.EDU>

--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'

Harald Hanche-Olsen

unread,

May 6, 2002, 4:30:53 PM5/6/02

to

+ Cory Spencer <cspe...@interchange.ubc.ca>:

| Just a quickie question - is there already a Common Lisp function that
| will split a string on a given character?

Did you check the HyperSpec? If so, did you see it in either the
Strings or the Sequences chapter? If you don't know what the
HyperSpec is, please do yourself a favour and go find it here:

http://www.xanalys.com/software_tools/

(not sure if this address is current -- I use my own local copy).

Apart from looking somewhat Scheme-ish (relying on recursion rather
than iteration), it won't work, because substring is not in the
language. With subseq instead it runs, but it probably does not quite
do what you expected of it:

(split #\/ "abc/def//ghi") ==> ("abc" "def" "" . "ghi")

(Note the dot.) This is probably a FAQ, but here is my suggestion for
a solution anyway:

(defun split (thing sequence)
(loop for start = 0 then (1+ end)
for end = (position thing sequence :start start)
collect (subseq sequence start end)
while end))

Note that it will split any sequence, not just a string:

(split #\/ "abc/def//ghi") ==> ("abc" "def" "" "ghi")
(split 0 #(3 1 0 5 7 1 9 0 0 5)) ==> (#(3 1) #(5 7 1 9) #() #(5))
(split 0 '(3 1 0 5 7 1 9 0 0 5)) ==> ((3 1) (5 7 1 9) NIL (5))

Neat, eh?

--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- Yes it works in practice - but does it work in theory?

Erik Naggum

unread,

May 6, 2002, 10:40:41 PM5/6/02

to

* Cory Spencer

| Just a quickie question - is there already a Common Lisp function that
| will split a string on a given character?

Most often, when people ask quickie questions, they have been working
themselves through what one would think of as a labyrinth where they make
brief excursions in the wrong direction and self-correct when they hit
the wall, so to speak. When they hit the wall and do not self-correct,
they post a quickie question, but there is an arbitary amount of back-
tracking involved in providing the right answer. Just moving the person
into a new labyrinth without the particular wall they have run into is
seldom the best answer, as the wrong choice they have made will lead them
right into another wall shortly thereafter. Therefore, a "quickie" is a
strong signal to experienced problem-solvers that something is wrong: The
requestor is stuck, but does not think he should have been. However, if
his thinking were correct, he would not be stuck. Yet he is, and that is
a hint that the amount of backtracking required will be significant and
that is just the opposite of a "quickie".

| ie) will perform a similar function as this:

Generally speaking, a reader or parser of some sort.

It is quite important to realize that you will never, ever have a case
where you can entirely get rid of the "splitting" character. If you
think you can legitimately expect this, you are just too inexperienced at
what you are doing and will run into a problem sooner or later. Let me
give you a few examples. Under Unix, you cannot have a colon in your
login name, in your home directory name, in your real name, or in your
shell, because the colon separates these fields in a system password
file. (Not to mention null bytes and newlines.) This is just too dumb
to be believable on the face of it, but it is actually the case. Unix
freaks do not think this is a problem because they internalize the rules
and do not _want_ a colon in those places. However, software that
updates the password file has to do sanity checks in order not to expose
the system to serious security risks because there is no way to escape a
payload colon from the delimiting colon. In the standard Unix shells,
whitespace separates arguments, but you have several escaping forms to
allow whitespace to exist in arguments. All in all, the mechanisms that
are used in the shell are quite arcane and difficult to predict from a
program, but a user can usually deal with it, in the standard Unix idea
of "usually". Then there is HTML and URL's and all that crap. To make
sure that a character is always a payload character, it must be written
as &#nnn, where nnn is the ISO 10646 code for character, or you have to
engagge in table lookups, context-sensitive parsing rules, and all sorts
of random weirdness. Likewise, in URL's, it is incredibly hard to get
all you want through to the other side. Recently, I subscribed to the
Unabridged Merriam-Webster dictionary, and they need the e-mail address
as the username. It turned out to be very hard to write a URL that had a
payload @ in the username and a syntax @ before the hostname. I actually
find such things absolutely incredible -- to be so thoughtless must have
been _really_ hard.

This is why you should not use position to find a character to split on,
you should use a state machine that traverses the string and finds only
those (matching) characters that are syntactically relevant, not those
(matching) characters that are (or should be) payload characters. A
regular expression is _not_ sufficient for this task.
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.

70 percent of American adults do not understand the scientific process.

Wade Humeniuk

unread,

May 7, 2002, 12:00:10 PM5/7/02

to

"Erik Naggum" <er...@naggum.net> wrote in message news:32297280...@naggum.net...
> * Cory Spencer

>
> This is why you should not use position to find a character to split on,
> you should use a state machine that traverses the string and finds only
> those (matching) characters that are syntactically relevant, not those
> (matching) characters that are (or should be) payload characters. A
> regular expression is _not_ sufficient for this task.

Thanks for the post Erik. What you said is very true. I have attached some code that
implements parsing time formats that I use in my running log program. I was amazed that
it got so large for such a simple spec but it was necessary to reliably dynamically
determine if user input was valid during any point of entering the data from a
time-input-pane.

Wade

time.lisp

jb

unread,

May 7, 2002, 5:06:52 PM5/7/02

to

Erik Naggum wrote:

> * Cory Spencer
> | Just a quickie question - is there already a Common Lisp function that
> | will split a string on a given character?
>
> Most often, when people ask quickie questions, they have been working
> themselves through what one would think of as a labyrinth where they
> make brief excursions in the wrong direction and self-correct when they
> hit
> the wall, so to speak. When they hit the wall and do not self-correct,
> they post a quickie question, but there is an arbitary amount of back-
> tracking involved in providing the right answer. Just moving the person
> into a new labyrinth without the particular wall they have run into is
> seldom the best answer, as the wrong choice they have made will lead
> them
> right into another wall shortly thereafter. Therefore, a "quickie" is a
> strong signal to experienced problem-solvers that something is wrong:
> The
> requestor is stuck, but does not think he should have been.
> However, if
> his thinking were correct, he would not be stuck. Yet he is, and that
> is a hint that the amount of backtracking required will be significant
> and that is just the opposite of a "quickie".

This is not always true. I have just changed my OS. I think it is completely
normal if I do not know how to solve even simple UNIX problems. Then it is
very helpful if I can ask somebody. (For example I could not install
anti.aliased fonts in Qt and the right hint, that solved my problem,
consisted of a single sentence.)

Now when I am acting as a teacher and one of my pupils asks me a "simple"
question I carefully investigate whether he is having a more serious
problem. In a newsgroup however (for example in de.sci.mathemtik) I simply
answer the question and do not care about his deeper problems.

> | ie) will perform a similar function as this:
>
> Generally speaking, a reader or parser of some sort.
>

> It is quite important [...]

I do not think, I have understood this deep essay on payload characters,
whatever they may be, and I wonder if the original poster did.
I must admit, however, that I do not understand the closing remark on 70% of
the American adults either.

--
J B

Il n'y a guère dans la vie qu'une préoccupation grave: c'est la mort;
(Dumas)

-----------== Posted via Newsgroups.Com - Uncensored Usenet News ==----------
http://www.newsgroups.com The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Ulimited Fast Downloads - 19 Servers =-----

jb

unread,

May 7, 2002, 5:11:10 PM5/7/02

to

Sorry, jb = Janos Blazi.

Herb Martin

unread,

May 7, 2002, 8:08:28 PM5/7/02

to

> Neat, eh?

I love it. Thanks.

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds