Search & Replace in sequences

85 views
Skip to first unread message

Rudolf Schlatte

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to

I am doing some string hacking right now, and would be grateful if any
of you could spare some comments on my code.

Here is a typical routine, it escapes every character of a given set
in its input string:


(defun escape-characters (latex-string &optional (chars "$&%#_{}"))
(let ((last-found 0)
(new-string (copy-seq latex-string)))
(while last-found
(setf last-found (position-if (lambda (char) (find char chars))
new-string :start last-found))
(when last-found
(setf new-string (concatenate 'string
(subseq new-string 0 last-found)
(string #\\)
(subseq new-string last-found))
;; don't want to loop forever
last-found (+ 2 last-found))))
new-string))


This works ok and the program does not have to run fast or anything,
but all this repeated string-building just feels wrong for me. Are
there more concise ways of writing this?

I tried (setf (subseq ...)) and (replace ...), but they only replace
same-length subsequences, as far as I can see. (Perhaps it's only
Perl withdrawal; mostly I am glad that I laid off the habit, but the
Regexps are handy sometimes.)

Thanks,

Rudi

Gareth McCaughan

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Rudolf Schlatte wrote:

> I am doing some string hacking right now, and would be grateful if any
> of you could spare some comments on my code.
>
> Here is a typical routine, it escapes every character of a given set
> in its input string:

..


>
>
> (defun escape-characters (latex-string &optional (chars "$&%#_{}"))
> (let ((last-found 0)
> (new-string (copy-seq latex-string)))
> (while last-found
> (setf last-found (position-if (lambda (char) (find char chars))
> new-string :start last-found))
> (when last-found
> (setf new-string (concatenate 'string
> (subseq new-string 0 last-found)
> (string #\\)
> (subseq new-string last-found))
> ;; don't want to loop forever
> last-found (+ 2 last-found))))
> new-string))
>
> This works ok and the program does not have to run fast or anything,
> but all this repeated string-building just feels wrong for me. Are
> there more concise ways of writing this?

I suspect that the most concise way goes something like
this:

(defun escape-characters (latex-string &optional (chars "$&%#_{}")

(escaper #\\))
(coerce (mapcan (lambda (char)
(if (find char chars) (list escaper char) (list char)))
(coerce latex-string 'list))
'string))

Conses hideously, of course.

> I tried (setf (subseq ...)) and (replace ...), but they only replace
> same-length subsequences, as far as I can see. (Perhaps it's only
> Perl withdrawal; mostly I am glad that I laid off the habit, but the
> Regexps are handy sometimes.)

CL could use some extra string and sequence functions.
It would, for instance, be nice to have something bearing
the same relation to MAPCAN as MAP does to MAPCAR.
(Whatever, exactly, that ought to mean.)

--
Gareth McCaughan Gareth.M...@pobox.com
sig under construction

Pierre R. Mai

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Rudolf Schlatte <rsch...@ist.tu-graz.ac.at> writes:

> I am doing some string hacking right now, and would be grateful if any
> of you could spare some comments on my code.
>
> Here is a typical routine, it escapes every character of a given set
> in its input string:
>
>

> (defun escape-characters (latex-string &optional (chars "$&%#_{}"))
> (let ((last-found 0)
> (new-string (copy-seq latex-string)))
> (while last-found
> (setf last-found (position-if (lambda (char) (find char chars))
> new-string :start last-found))
> (when last-found
> (setf new-string (concatenate 'string
> (subseq new-string 0 last-found)
> (string #\\)
> (subseq new-string last-found))
> ;; don't want to loop forever
> last-found (+ 2 last-found))))
> new-string))
>
>
> This works ok and the program does not have to run fast or anything,
> but all this repeated string-building just feels wrong for me. Are
> there more concise ways of writing this?

In escaping or unescaping, you'll have to live with some form of
"ugliness": Either consing up new strings, or moving stuff around in
a non-simple string (or some other intermediate data-structure), or
doing a prepass to calculate the lengthening or shortening upfront.

Stuff like regexps just hides this ugliness beneath a slightly less
ugly facade (and introduces it's own ugliness).

I most often just do the prepass, which is a little count-if and
multiply, and then do the substitution/copying in place in the
allocated result buffer. This reduces consing to a minimum, and
keeps the code half-way simple:

(defun escape-string (string escape-char chars)
(flet ((escape-p (char) (find char chars)))
(let ((escape-count (count-if #'escape-p string)))
(if (zerop escape-count)
string ;; Possibly copy if a real copy is always wanted
(do* ((length (length string))
(result (make-string (+ length escape-count)))
(result-pos 0 (1+ result-pos))
(source-pos 0 (1+ source-pos)))
((= source-pos length) result)
(let ((source-char (char string source-pos)))
(when (escape-p source-char)
(setf (char result result-pos) escape-char)
(incf result-pos))
(setf (char result result-pos) source-char)))))))

Regs, Pierre.

--
Pierre Mai <pm...@acm.org> PGP and GPG keys at your nearest Keyserver
"One smaller motivation which, in part, stems from altruism is Microsoft-
bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

Erik Naggum

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
* Rudolf Schlatte

| This works ok and the program does not have to run fast or anything, but
| all this repeated string-building just feels wrong for me. Are there
| more concise ways of writing this?

yes, use with-output-to-string and write characters and strings to the
string stream with stream primitives like write-char and write-string.
the underlying buffering mechanism is very well optimized for growing
strings, which is what you are doing in an excessively expensive way.

C and Perl don't have string streams, so every C and Perl programmer
thinks in-memory string processing is more efficient than streams and
that you always benefit from doing your own buffering algorithm. neither
is, was, or will become true. languages shape the way we think, or don't.

#:Erik

Rudolf Schlatte

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to

Thanks to all of you who replied. It is interesting that the three
answers are on the extremes of the solution-space, as I see it:

- the elegant but highly wasteful expression
- the somewhat long-winded but efficient way
- the surprising solution from another angle

It was all I could hope for. Thanks again.

Erik Naggum <er...@naggum.no> writes:

[...]


> C and Perl don't have string streams, so every C and Perl programmer
> thinks in-memory string processing is more efficient than streams and
> that you always benefit from doing your own buffering algorithm. neither
> is, was, or will become true. languages shape the way we think, or don't.

I am aware of that. The nice thing about Lisp is that I can trust my
gut feeling of "it works, but it's somehow not right", and also the
other one of "now _this_ is *solid*".

Rudi

Fernando D. Mato Mira

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Gareth McCaughan wrote:

> CL could use some extra string and sequence functions.
> It would, for instance, be nice to have something bearing
> the same relation to MAPCAN as MAP does to MAPCAR.

Things I find missing:

1. tree versions of delete/remove functions.
2. Mutiple-value versions of MAPCAR et al.
3. regexp versions of nsubst and subst (more generally, subst-if and nsubst-if
versions that take a new element generator function argument).

--
Fernando D. Mato Mira
Real-Time SW Eng & Networking
Advanced Systems Engineering Division
CSEM
Jaquet-Droz 1 email: matomira AT acm DOT org
CH-2007 Neuchatel tel: +41 (32) 720-5157
Switzerland FAX: +41 (32) 720-5720

www.csem.ch www.vrai.com ligwww.epfl.ch/matomira.html


Bernhard Pfahringer

unread,
Mar 21, 2000, 3:00:00 AM3/21/00
to
Rudolf Schlatte <rsch...@ist.tu-graz.ac.at> writes:

> Thanks to all of you who replied. It is interesting that the three
> answers are on the extremes of the solution-space, as I see it:
>
> - the elegant but highly wasteful expression
> - the somewhat long-winded but efficient way
> - the surprising solution from another angle
>
> It was all I could hope for. Thanks again.
>

Actually judging efficiency can be tricky, e.g. consider:

(defun foo (string escape-fn escape-char)
(with-output-to-string (result-string)
(loop for char across string
when (funcall escape-fn char)
do (write-char escape-char result-string)
do (write-char char result-string))
result-string))

This (IMHO elegant) version has the property that ESCAPE-FN is only called once
for each character. The long-winded version needs to call it twice or instead allocate
another data-structure for recording these results. So depending on the cost of calling
ESCAPE-FN and the cost of having one more data-structure around, there are cases
where the elegant solution will even be more efficient as well.

cheers, Bernhard
(yes I've moved to the other of the world, if somebody should wonder :-)

Reply all
Reply to author
Forward
0 new messages