Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Substring?

78 views
Skip to first unread message

glauber

unread,
Aug 9, 2001, 5:39:06 PM8/9/01
to
Is there a better way to get a "substring" than using subseq?

g

Kent M Pitman

unread,
Aug 9, 2001, 6:21:55 PM8/9/01
to
thegl...@my-deja.com (glauber) writes:

> Is there a better way to get a "substring" than using subseq?

What would count as "better"?

SUBSEQ is the intended answer.

You can use (SUBSEQ (THE STRING x)) if you want to be more perspicuous
about its stringness.

glauber

unread,
Aug 10, 2001, 9:59:04 AM8/10/01
to
Kent M Pitman <pit...@world.std.com> wrote in message news:<sfw4rrg...@world.std.com>...

I was hoping there would be either a specific string version or a way
to copy characters from a string without consing. I'm trying to speed
up a program that loops through a few million records in a large text
file and counts them by certain field positions. I'm using a hash to
do the counts, but i have to cons a substring every time just to do
the hash lookup.

I can create the string once with make-string, but can't find a way to
copy the characters i want into my string (other than hand-coding it
with a loop, which is probably not the best answer since i'm using
CLisp).

There is copy-sequence, but it copies the whole sequence. If that had
:start and :stop parameters, it would do what i want.

I really hoped that there would be a function to copy an "array slice"
that would map in a low level to a memcpy, but there doesn't seem to
be anything in the arrays dictionary in the spec.

Since these are small strings i'll code the copy with aref and setf by
unrolling the copy loop.


g

Barry Margolin

unread,
Aug 10, 2001, 11:48:21 AM8/10/01
to
In article <892f97d1.01081...@posting.google.com>,

glauber <thegl...@my-deja.com> wrote:
>I was hoping there would be either a specific string version or a way
>to copy characters from a string without consing. I'm trying to speed
>up a program that loops through a few million records in a large text
>file and counts them by certain field positions. I'm using a hash to
>do the counts, but i have to cons a substring every time just to do
>the hash lookup.

All of the sequence functions take :START and :END parameters. Perhaps
instead of copying the substrings, you could just remember the field
positions, and access the original string using those positions.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Kent M Pitman

unread,
Aug 10, 2001, 12:00:57 PM8/10/01
to
thegl...@my-deja.com (glauber) writes:

> Kent M Pitman <pit...@world.std.com> wrote in message news:<sfw4rrg...@world.std.com>...
> > thegl...@my-deja.com (glauber) writes:
> >
> > > Is there a better way to get a "substring" than using subseq?
> >
> > What would count as "better"?
> >
> > SUBSEQ is the intended answer.
> >
> > You can use (SUBSEQ (THE STRING x)) if you want to be more perspicuous
> > about its stringness.
>
> I was hoping there would be either a specific string version

Why? (SUBSEQ (THE STRING x) ...sorry i forgot the index args...)
is the specific string version. CL has a quite large number of
vector types--would they each have to have their own name?
If it matters that much to you, just do:

(declaim (inline substring))
(defun substring (string &rest args)
(declare (type string string))
(apply #'subseq string args))

or write the equivalent macro if you don't want to trust some implementations
to do inlining. (I've heard recently to my surprise that Franz does not
do inlining in Allegro??)

> or a way
> to copy characters from a string without consing.

This is a completely different function you didn't ask about. You want
MAP-INTO for this. It is also a general-purpose sequence function but
should not cons.

You should also look at REPLACE, which takes ranges.

LOOP also lets you write this trivially.

Your homework assignment: READ THE SEQUENCE CHAPTER IN CLHS.
ALL OF THOSE FUNCTIONS APPLY TO STRINGS, VECTORS (including BIT-VECTORS),
LISTS, ETC.

> I'm trying to speed
> up a program that loops through a few million records in a large text
> file and counts them by certain field positions. I'm using a hash to
> do the counts, but i have to cons a substring every time just to do
> the hash lookup.

No, you don't.

Accumulate the token into an adjustable array (the value of a global
variable that you reuse or that is resourced in some way, so you
don't cons a new one every time. Then do:

;; buffer here is your adjustable string with fill-pointer
;; it is not suitable for storing but IS suitable for lookup
(defun lookup-token (buffer)
(or (gethash buffer *token-table*)
;; Your only consing goes here
(let ((new-token-name (copy-seq buffer)))
(setf (gethash new-token-name)
(make-a-token new-token-name)))))



> I can create the string once with make-string, but can't find a way to
> copy the characters i want into my string (other than hand-coding it
> with a loop, which is probably not the best answer since i'm using
> CLisp).

MAKE-STRING isn't going to make you an adjustable array. Unless you
have ESP about how long a string you'll need, this is going to be a bad
plan regardless. You need an extensible buffer for accumulating token
names correctly.



> There is copy-sequence, but it copies the whole sequence. If that had
> :start and :stop parameters, it would do what i want.

You should also look at REPLACE. By the way, the parameter is consistently
called :END, not :STOP, throughout the language.

> I really hoped that there would be a function to copy an "array slice"
> that would map in a low level to a memcpy, but there doesn't seem to
> be anything in the arrays dictionary in the spec.

You are backward chaining on your specific problem.

Invest in yourself. Do some forward chaining and read the chapters
you might need.

SEQUENCE is a supertype of array so all sequence functions apply to arrays.

REPLACE is probably what you are looking for but didn't actually ask for.



> Since these are small strings i'll code the copy with aref and setf by
> unrolling the copy loop.

Properly declared I think you'll do ok with that. But REPLACE or MAP-INTO
are probably bummed to be faster. REPLACE is probably what you need
because it manages start/stop.

If you meter it and find consing, it may be the use of keyword args, not the
actual replacement that is consing. In that case, ask your vendor to
provider a compiler optimizer that special-cases that argument configuration.
I don't really see why all arg conventions couldn't be dynamically optimized,
but I certainly feel the common cases should be.

If it still ends up consing, fall back to LOOP.

Wolfhard Buß

unread,
Aug 10, 2001, 12:42:51 PM8/10/01
to
Kent M Pitman <pit...@world.std.com> writes:

> Your homework assignment: READ THE SEQUENCE CHAPTER IN CLHS.

:

> SEQUENCE is a supertype of array so all sequence functions apply to arrays.

For the interpretation of

(subtypep 'array 'sequence)
=> NIL T

look at the fine CLHS of Kent Pitman ;-)


-wb

glauber

unread,
Aug 10, 2001, 1:56:44 PM8/10/01
to
thegl...@my-deja.com (glauber) wrote in message news:<892f97d1.01081...@posting.google.com>...

[...]

> I was hoping there would be either a specific string version or a way
> to copy characters from a string without consing. I'm trying to speed
> up a program that loops through a few million records in a large text
> file and counts them by certain field positions. I'm using a hash to
> do the counts, but i have to cons a substring every time just to do
> the hash lookup.

Well...
I create the strings once now with make-string, and use read-sequence
to read from the file into these strings. No consing there.

However... there was no speedup. I'm thinking now that the slowness is
in the hash table lookups and updates.

This version makes that worse, by the way, because i have to first
check if the information is already in the hash. If i'm creating a new
hash entry, then i have to cons a new string to serve as hash key. If
not, then just retrieve the count and increment it.

Oh well, another case where the garbage collector outperforms my puny
attempts to optimize.

It seems to me that the allocator and garbage collector in CLisp are
very good.

Maybe the hash tables are not as fast as they could be (not as fast as
Perl, for example, as much as i hate bringing this up).

g

Kent M Pitman

unread,
Aug 10, 2001, 2:15:25 PM8/10/01
to
wb...@gmx.net (Wolfhard Buß) writes:

Sorry, I was speaking conversationally. 1-d arrays are vectors.

(subtypep 'vector 'sequence) => T T

Yes, you're right that the sequence functions don't apply to n-dimensional
arrays (other than n=1).

Erik Naggum

unread,
Aug 10, 2001, 2:18:04 PM8/10/01
to
* thegl...@my-deja.com (glauber)

> I really hoped that there would be a function to copy an "array slice"
> that would map in a low level to a memcpy, but there doesn't seem to
> be anything in the arrays dictionary in the spec.

You can do better than that. An adjustable, displaced array may be your
answer. Suppose you have a string and need to compute a hash value of a
substring and the hash function does not take :start and :end arguments.
You could wrap the call to the hash function in something like this:

(defun funcall-with-substring (function string &key (start 0) end)
(let ((substring (load-time-value (make-array 0 :element-type 'character :adjustable t))))
(funcall function (adjust-array substring (- (or end (length string)) start)
:displaced-to string
:displaced-index-offset start))))

///

Kent M Pitman

unread,
Aug 10, 2001, 2:19:38 PM8/10/01
to
thegl...@my-deja.com (glauber) writes:

> Maybe the hash tables are not as fast as they could be (not as fast as
> Perl, for example, as much as i hate bringing this up).

Hash table speed is a property of implementations, not of languages.

Even for Perl, which I've been told has only one significant implementation,
I assume there are different versions and the versions could vary in speed.

Always cite implementation and version, as well as relevant declarations
(viz, the OPTIMIZE declaration and associated qualities in CL) when
comparing speed. Otherwise, such remarks are meaningless.

Kent M Pitman

unread,
Aug 10, 2001, 2:40:11 PM8/10/01
to
Erik Naggum <er...@naggum.net> writes:

Hmm. Hadn't thought to do this with displaced arrays.

This isn't re-entrant. And keywords often cons. So I might suggest:

(defun make-substring-buffer ()

(make-array 0 :element-type 'character :adjustable t))

(defvar *substring-buffer* (make-substring-buffer))

(defun funcall-with-substring (function string &optional (start 0) end)
(let ((substring (or *substring-buffer*
(setq *substring-buffer* (make-substring-buffer))))
(*substring-buffer* nil))
(funcall ...etc.)))

One thing to watch out for is that the element-type of the substring buffer
and of the buffer to displace to must match. You could keep a cache
per element-type.

You might also want to clean up the string to not hold onto a pointer
to the underlying buffer (for gc reasons) after, too, perhaps in an
unwind-protect. Then again, that might slow things down and it depends
on how "tightly coupled" this routine is with the knowledge that the
consumer will be the same old buffer every time as to whether you might
omit it.

Likewise the issue of reentrancy is something you may not care about, but
I've found it's a sufficiently fast to do the above little hack that
there's little point to not doing at least that, if not using a real
resource facility (which might not be adequately fast).

The code I offered probably isn't strong enough to work in a multi-tasking
application.

Thomas F. Burdick

unread,
Aug 10, 2001, 2:57:51 PM8/10/01
to
thegl...@my-deja.com (glauber) writes:

> Well...
> I create the strings once now with make-string, and use read-sequence
> to read from the file into these strings. No consing there.
>
> However... there was no speedup. I'm thinking now that the slowness is
> in the hash table lookups and updates.
>
> This version makes that worse, by the way, because i have to first
> check if the information is already in the hash. If i'm creating a new
> hash entry, then i have to cons a new string to serve as hash key. If
> not, then just retrieve the count and increment it.
>
> Oh well, another case where the garbage collector outperforms my puny
> attempts to optimize.

Out of curiosity, did you try profiling your code before you optimized
it? :) It seems like when I leave the profiling stage out ("bah, it's
obvious where the critical parts of this are"), about 1/3 of the time
I guess completely wrong and slow the code down by optimizing
non-critical bits at the slight expense of critical bits.

Thomas F. Burdick

unread,
Aug 10, 2001, 2:58:36 PM8/10/01
to
Kent M Pitman <pit...@world.std.com> writes:

Well, he did mention the implementation: CLISP, he just used funny
capitalization (CLisp).

Kent M Pitman

unread,
Aug 10, 2001, 3:53:02 PM8/10/01
to

Though even then not the version. It's not fair to CLISP if they fix
the speed as a bug to have this still sitting around in Deja News for
people to find and confuse them.

glauber

unread,
Aug 10, 2001, 5:38:57 PM8/10/01
to
t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) wrote in message news:<xcv3d6z...@conquest.OCF.Berkeley.EDU>...
[...]

> Out of curiosity, did you try profiling your code before you optimized
> it? :) It seems like when I leave the profiling stage out ("bah, it's
> obvious where the critical parts of this are"), about 1/3 of the time
> I guess completely wrong and slow the code down by optimizing
> non-critical bits at the slight expense of critical bits.

You're completely right, of course.
In the end this was a waste of time, because the consing was not the
problem. But i didn't waste too much time, and i learned a couple of
good tricks that i'll remember next time. Well maybe profiling was
exactly what i was doing! :-)

g

glauber

unread,
Aug 10, 2001, 5:48:42 PM8/10/01
to
Kent M Pitman <pit...@world.std.com> wrote in message news:<sfw8zgr...@world.std.com>...

[...]


> REPLACE is probably what you are looking for but didn't actually ask for.

[...]

Replace is exactly what i was looking for, thanks!
I did ask for it, remember i was looking for something similar to the
C function memcpy. The names of Common Lisp functions are not always
obvious to people raised on C and friends.

I was looking for something specific to strings thinking that Lisp was
maybe wasting time trying to figure out what sequence type before
doing the operation, but "the" should take care of that.

Many thanks to Erik also for the trick with displaced arrays! That's a
way to do what i want without having to copy anything. Of course there
must be some overhead too, but a simple timing loop should tell.

Thanks again,

glauber

Marco Antoniotti

unread,
Aug 10, 2001, 10:25:26 PM8/10/01
to

... and, AFAIU, writing an APL 'reshape' operation ain't that trivial.

Cheers


--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
719 Broadway 12th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.

glauber

unread,
Aug 11, 2001, 11:31:09 AM8/11/01
to
Kent M Pitman <pit...@world.std.com> wrote in message news:<sfw3d6z...@world.std.com>...
[...]

> Though even then not the version. It's not fair to CLISP if they fix
> the speed as a bug to have this still sitting around in Deja News for
> people to find and confuse them.

OK, CLISP 2.26. Anything else you want to know?

Funny how a simple question starts a silly flamewar. I probably got
you in a bad day, sorry.

Thanks for non-flaming part of your reply; it was useful.

glauber

Frode Vatvedt Fjeld

unread,
Aug 14, 2001, 5:11:03 AM8/14/01
to
Kent M Pitman <pit...@world.std.com> writes:

> (defun funcall-with-substring (function string &optional (start 0) end)
> (let ((substring (or *substring-buffer*
> (setq *substring-buffer* (make-substring-buffer))))
> (*substring-buffer* nil))
> (funcall ...etc.)))

> [..]

> The code I offered probably isn't strong enough to work in a
> multi-tasking application.

Exactly, there is a race-condition in there, and you don't want
inter-thread synchronization unless you know you'll need it. Doesn't
this suggest that functionality such as this is not suitable for
wrapping into neat abstractions like FUNCALL-WITH-SUBSTRING? There are
subtle problems and trade-offs that the user needs to know about, and
any ~5-liner that needs to make this many trade-offs isn't worth
it. Sort of like there's not much point in having a SPLIT-STRING cast
in concrete.

--
Frode Vatvedt Fjeld

Kent M Pitman

unread,
Aug 14, 2001, 12:19:12 PM8/14/01
to
Frode Vatvedt Fjeld <fro...@acm.org> writes:

> Kent M Pitman <pit...@world.std.com> writes:
>
> > (defun funcall-with-substring (function string &optional (start 0) end)
> > (let ((substring (or *substring-buffer*
> > (setq *substring-buffer* (make-substring-buffer))))
> > (*substring-buffer* nil))
> > (funcall ...etc.)))
>
> > [..]
>
> > The code I offered probably isn't strong enough to work in a
> > multi-tasking application.
>
> Exactly, there is a race-condition in there, and you don't want
> inter-thread synchronization unless you know you'll need it. Doesn't
> this suggest that functionality such as this is not suitable for
> wrapping into neat abstractions like FUNCALL-WITH-SUBSTRING?

No, it just means you have to document the restriction. And if you were
a Lisp vendor you would make it work reliably rather than publish a silly
restriction like this. If you're just an application developer, it's routine
to have generic named stuff like this that comes with caveats. That's called
"good engineering". I wrote such a thing the other day that wasn't correctly
interlocked but where the cost of doing all the interrupt management for
my application vs the cost of just "letting it do the wrong thing" (where
that action was bounded and forseeable) made it not worth the effort to do
the really "right" thing. But always, I recommend just adding a comment
in the code (and doc, if any) saying there's a pitfall. And it's never
a bad time to think about packaging up your thoughts. After all, the
implementation exists only once and can be fixed. The function name suggests
the abstraction and there may be many callers of the abstraction; they won't
be bothered if you later replace the abstraction's implementation with a
better one. The chosen name and calling sequence help you remember what
you had intended. The last thing you want is a name like
FUNCALL-WITH-SUBSTRING-BUT-OCCASIONALLY-LOSE-IN-A-MULTITASKING-ENVIRONMENT
It isn't what your code's clients want, and is just not helpful in any way
as a compromise.

> There are
> subtle problems and trade-offs that the user needs to know about,

Entirely depenent on who the user is. I do not always assume that the
people I'm helping with advice here are making general-purpose programs
to sell to others. Sale comes with a warranty and they'd better figure out
for themselves what they can warranty and what they can't. I'm just
offering ways to "get stuff done", which is what most application developers
need. And along the way I'm pointing out "additional issues" that they might
want to think about when they get more advanced. Your mileage may vary.

When it comes right down to it, there's equivalent question as to wheter
(setf (get ...) ...) is interlocked against multitasking. Since CL has no
multitasking, that's an issue it's up to the vendor to consider. But given
that most CL programs are also not synchronized against this, fussing over
my FUNCALL-WITH-SUBSTRING hardly seems the place to start.

> and
> any ~5-liner that needs to make this many trade-offs isn't worth
> it.

I disagree there strongly. It depends entirely on how many times it's
called, not on how long the function is.

> Sort of like there's not much point in having a SPLIT-STRING cast
> in concrete.

Now here I somewhat disagree again, but for a different reason.

SPLIT-STRING is *plainly* something that is dancing around an idiom common
to LOTS of programming. The closer we come as a community to having a
normative way of wirting this idiom, the less vendor-specific compilers will
have to recognize every possible long-hand rendition of this operation.
There are a lot of ways people can *implement* SPLIT-STRING, but many fewer
typical ways that people will call it. Since the typical calls may be
optimizable in ways that don't even go through the function, taking advantage
of low-level system-specific optimizations not accessible to portable code.

SPLIT-STRING *is* something that I think is used enough that it should
eventually be upgraded to "language" status. FUNCALL-WITH-SUBSTRING might
be, but the case is more difficult to make, just on sheer frequency of
use. So that's why I see it more worth spending community time talking
about SPLIT-STRING -- because it's something it would be nice for us all to
get done talking about and move on from, with only "use of it" to follow.

Erik Naggum

unread,
Aug 14, 2001, 9:24:18 PM8/14/01
to
* Kent M Pitman <pit...@world.std.com>

> Then again, that might slow things down and it depends on how "tightly
> coupled" this routine is with the knowledge that the consumer will be the
> same old buffer every time as to whether you might omit it.

It was an attempt to show how things could be used. My purpose was not
to post a function that people should stuff in their private collection
of code gems and hope that it would save them someday, which I think
"something like this" would have indicated. I posted it as a function
instead of these abominable patterns/idioms that people seem to like, but
really, it was intended as nothing more than to demonstrate the use of
displaced arrays and load-time-value to avoid consing up those arrays.

> The code I offered probably isn't strong enough to work in a multi-tasking
> application.

I should learn to put in disclaimers like this, too. :)

///

j...@itasoftware.com

unread,
Sep 5, 2001, 10:51:18 AM9/5/01
to
Kent M Pitman <pit...@world.std.com> writes:

> or write the equivalent macro if you don't want to trust some implementations
> to do inlining. (I've heard recently to my surprise that Franz does not
> do inlining in Allegro??)

This is true.

In Franz documentation section 10.1, Inline declarations ignored, they
state:
The INLINE declaration is ignored by the compiler. User defined
functions are never compiled inline.

They do open-code some primitives, but the list of primitives is not
extensible.

You may also be surprised to learn that Franz doesn't do too much in
the way of constant folding:

(car '(a b c)) will generate instructions to move the list (a b c)
into a register, then dereference the car of that list.

(schar "this is a test" 7) generates instructions to move the constant
string into a register, dereference the the appropriate word, then
shift it into position.

(coerce 42 'fixnum) generates a runtime call to COERCE

Geoff Summerhayes

unread,
Sep 5, 2001, 3:55:27 PM9/5/01
to

<j...@itasoftware.com> wrote in message
news:n149k7...@itasoftware.com...

<newbie>

Am I correct in understanding that it's not possible to do too much
about this? define-compiler-macro looks like a reasonable idea but
the HS says results are undefined for functions in the common-lisp
package. (Although it _appears_ to work in LW)

</newbie>

Geoff

Kent M Pitman

unread,
Sep 5, 2001, 8:17:45 PM9/5/01
to
"Geoff Summerhayes" <sNuOmS...@hNoOtSmPaAiMl.com> writes:

> Am I correct in understanding that it's not possible to do too much

> about this [what vendors do optimization-wise]?

Whether you can do anything about it depends on the vendor. Both in what
they have implemented but also in in how responsive to customer needs they
are.

However, a mere request does not imply need. Vendors get requests for all
manner of things on whim from various people. Make sure you in your request
to motivate why you need something.

> define-compiler-macro looks like a reasonable idea but
> the HS says results are undefined for functions in the common-lisp
> package. (Although it _appears_ to work in LW)
>
> </newbie>

The reason you ought not do this are:

(1) The standard says so.

(2) You cannot know whether the compiler is planning to optimize
system functions already in other ways, and you may be pessimizing
the function by clobbering important functionality that comes
pre-packaged in the system.

(3) You're not the only package that might have an interest in doing this,
so you'd either clobber or be clobbered by another package's attempt
to do the same.

(4) You might might make a mistake and make the whole system melt down.

Geoff Summerhayes

unread,
Sep 6, 2001, 12:24:33 PM9/6/01
to

"Kent M Pitman" <pit...@world.std.com> wrote in message
news:sfwk7zd...@world.std.com...

> "Geoff Summerhayes" <sNuOmS...@hNoOtSmPaAiMl.com> writes:
>
> > define-compiler-macro looks like a reasonable idea but
> > the HS says results are undefined for functions in the common-lisp
> > package. (Although it _appears_ to work in LW)
> >
> > </newbie>
>
> The reason you ought not do this are:
>
> (1) The standard says so.
>
> (2) You cannot know whether the compiler is planning to optimize
> system functions already in other ways, and you may be pessimizing
> the function by clobbering important functionality that comes
> pre-packaged in the system.
>
> (3) You're not the only package that might have an interest in doing
this,
> so you'd either clobber or be clobbered by another package's
attempt
> to do the same.
>
> (4) You might might make a mistake and make the whole system melt
down.

(1) It's a bad habit but I have this reverse psychology thing when
people
say don't do that. I blame it on my C background, all the interesting
functions _are_ non-standard.
(2) A very good reason but I can see what the compiler is producing
and make an educated guess as to what it's doing.(pompous little git,me)
(3) Hmm... maybe (if (null (compiler-macro-function name))...
(4) Did I mention my C background? Meltdowns `R' Us :-)

Seriously though, I would not attempt this other than in
my own little private sandbox where the occasional puff
of purple smoke followed by a quiet `bammff' goes largely
unnoticed. I just can't resist taking things apart and
sticking my fingers near the power supply...again.

Geoff

Kent M Pitman

unread,
Sep 6, 2001, 1:02:19 PM9/6/01
to
"Geoff Summerhayes" <sNuOmS...@hNoOtSmPaAiMl.com> writes:

> > The reason you ought not do this are:
> >
> > (1) The standard says so.

> (1) It's a bad habit but I have this reverse psychology thing when


> people say don't do that. I blame it on my C background, all the
> interesting functions _are_ non-standard.

We can't protect you from yourself. However, in general, you should really
get used to the idea that in Lisp, unlike in C, anything the standard says
don't do is license for the vendor to seriously screw you with some clever
optimization thta really depends on your not doing this. Being non-standard
is different than being anti-standard. Lisp leaves room for things you can
do that are not authorized by the standard, but that's different than things
you are told not to do.

> > (2) You cannot know whether the compiler is planning to optimize
> > system functions already in other ways, and you may be pessimizing
> > the function by clobbering important functionality that comes
> > pre-packaged in the system.

> (2) A very good reason but I can see what the compiler is producing


> and make an educated guess as to what it's doing.(pompous little git,me)

Bad plan. You mean to say "one of many compilers". If you ever try
to port your code or your vendor ever changes (I call this "porting
into the future") you will get screwed. So this is more even more
pompous than you think and will result in code that is not durable.

> > (3) You're not the only package that might have an interest in doing
> > this, so you'd either clobber or be clobbered by another package's
> > attempt to do the same.

> (3) Hmm... maybe (if (null (compiler-macro-function name))...

Only if you assume others are doing the same. This is a prisoner's dilemma.
It's hard enough relying on others to merely follow the rules (e.g., see
your remarks in (1) above :-) without relying on them to spontaneously
invent and adhere to additional new ones...

> > (4) You might might make a mistake and make the whole system melt
> > down.

> (4) Did I mention my C background? Meltdowns `R' Us :-)

You didn't need to move from C to Lisp to do this. C meltdowns are well
known to have a smaller footprint and less runtime overhead than Lisp ones.

> Seriously though, I would not attempt this other than in
> my own little private sandbox where the occasional puff
> of purple smoke followed by a quiet `bammff' goes largely
> unnoticed. I just can't resist taking things apart and
> sticking my fingers near the power supply...again.

People often do this thinking it will tell them something about "why".
But really you don't. The "why" you can learn only by static thought
about the whole space of things vendors and potential vendors might do.
You learn nothing but unwarranted boldness by doing this kind of thing
and succeeding. If you do it and fail, you learn only that the standard
was right in the first place.

No implementation is a reference implementation. So poking against one
is like trying to learn "human psychology" from "interviewing a human".
All you learn is to confuse yourself because the information you gain,
unless you plan to become a statistician and interview thousands, is
mostly worthless. And even if you interview everyone in the world, you
aren't guaranteed that the next one to be cranked out won't behave still
differently than every one you knew...

Tim Bradshaw

unread,
Sep 6, 2001, 1:25:52 PM9/6/01
to
* Kent M Pitman wrote:

> You didn't need to move from C to Lisp to do this. C meltdowns are well
> known to have a smaller footprint and less runtime overhead than Lisp ones.

This is one of those persistent myths. Modern C-based systems can
produce really impressive core dumps. You don't think that all that
work to allow 64bit file offsets in UFS was done for user files did
you? Not to mention serious disk I/O: on a reasonable
enterprise-class machine you can even get your typical 20Gb core dump
from, say, cat, onto the disk in only a few minutes, which speeds up
development time no end.

--tim

Tim Bradshaw

unread,
Sep 6, 2001, 1:43:30 PM9/6/01
to
* I wrote:

> [rant, rave, dribble]

And another thing, since I'm here. All those big enterprise-class SMP
machines that they want you to think are for running big databases?
They're lying. I just built gcc 3.0.1 from source today, and I
*seriously* need some more CPUs in my machine so I can use parallel
make and get more than one build done per day. The build creates
nearly 400Mb of files, on top of the nearly 90Mb of sources.
Completely insane.

--tim

Erik Naggum

unread,
Sep 6, 2001, 1:57:17 PM9/6/01
to
* Kent M Pitman <pit...@world.std.com>
> Being non-standard is different than being anti-standard. Lisp leaves
> room for things you can do that are not authorized by the standard, but
> that's different than things you are told not to do.

I think the term "extra-standard" is better than "non-standard",
precisely for this reason.

///

Geoff Summerhayes

unread,
Sep 6, 2001, 6:03:55 PM9/6/01
to

"Kent M Pitman" <pit...@world.std.com> wrote in message
news:sfwn148...@world.std.com...

>
> We can't protect you from yourself.

Agreed. :-)

<good arguments elided>

All of your arguments stem from the idea that portability
and upgrade potential are on my current goal list. They
aren't. It was just a set of one-offs to see if I could do
it with this version of LW (I can) and CLISP (I can set
the compiler macro but it is ignored during a compile).

Kind of fun really, I got practice with compiler macros
without having to write a function to base the macro on.

Geoff

Thomas F. Burdick

unread,
Sep 6, 2001, 10:37:19 PM9/6/01
to
Tim Bradshaw <t...@cley.com> writes:

Yeah, someone hasn't done C development recently :). Something simple
like:

char a="foo";
a[3]='!';

when you haven't installed a SIGSEGV handler isn't likeley to produce
too much, but "meltdown" implies something more dramatic. Hell, I
once had a C++-based meltdown whose core filled up all the available
disk space, which caused some error that the program exacerbated with
its "handler", which caused the kernel to panic, which killed the file
systems. Or so went the (unhappy) sysadmin's best piecing together of
events...

I have to say, I've not done anything *nearly* that dramatic in Lisp
(yet?)

Raymond Wiker

unread,
Sep 7, 2001, 3:56:43 AM9/7/01
to
t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> Yeah, someone hasn't done C development recently :). Something simple
> like:
>
> char a="foo";
> a[3]='!';

This isn't going to compile, let alone run. If you're *really*
lucky, you might get a compiler meltdown instead :-)

--
Raymond Wiker
Raymon...@fast.no

will deakin

unread,
Sep 7, 2001, 4:40:55 AM9/7/01
to
Bah! Only 400Mb of files! Rebuild mozilla and you get over 2Gb of files!

:)w

j...@itasoftware.com

unread,
Sep 7, 2001, 8:57:26 AM9/7/01
to
t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> when you haven't installed a SIGSEGV handler isn't likeley to produce
> too much, but "meltdown" implies something more dramatic. Hell, I
> once had a C++-based meltdown whose core filled up all the available
> disk space, which caused some error that the program exacerbated with
> its "handler", which caused the kernel to panic, which killed the file
> systems. Or so went the (unhappy) sysadmin's best piecing together of
> events...
>
> I have to say, I've not done anything *nearly* that dramatic in Lisp
> (yet?)

I've heard of someone who accidentally `garbage collected' the video
controller and burnt out their monitor.

(Some video controllers can be programmed to send signals way outside
the specs for some monitors, and some monitors don't guard against
such things. The GC had a bug and it encountered an untagged value.
It `transported' the `object' on top of the memory-mapped display
controller. The screen went black, there was a high-pitched whine and
then a loud *SNAP*.)

Come to think of it, this does sound like an Urban Legend, though.

Rob Warnock

unread,
Sep 7, 2001, 9:40:31 PM9/7/01
to
Tim Bradshaw <t...@cley.com> wrote:
+---------------
+---------------

<advert mode="blatant">
Well, we'll *gladly* sell you a *much* bigger machine:

<URL:http://www.sgi.com/origin/3000/3800.html>
<URL:http://www.sgi.com/products/storage/9400.html>

Up to 512 CPUs, up to 1TB main memory (cache-coherent to all CPUs), up
to 8TB of disk per 19" rack. And if you buy enough disk controllers
and disks, up to ~10GB/sec of Unix filesystem I/O through a single file
descriptor.

'Course, it'll cost you... ;-} ;-}
</advert>


-Rob

p.s. Obligatory vaguely on-topic question: Anybody make
a CL that can make good use of a *large* number of CPUs...?

-----
Rob Warnock, 30-3-510 <rp...@sgi.com>
SGI Network Engineering <http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy. Phone: 650-933-1673
Mountain View, CA 94043 PP-ASEL-IA

Bulent Murtezaoglu

unread,
Sep 7, 2001, 9:52:05 PM9/7/01
to
>>>>> "RW" == Rob Warnock <rp...@rigden.engr.sgi.com> writes:
[...]
RW> p.s. Obligatory vaguely on-topic question: Anybody make a CL
RW> that can make good use of a *large* number of CPUs...?

You probably already know this: I think BBN had a CL for the Butterfly.
How large is large? The Butterfly went up to 128 or 256 (I am sure of
128, I just don't remember if that was the largest).

OB question in the same vein: which CL's handle 64bit address spaces?

cheers,

BM

Kevin Rosenberg

unread,
Sep 7, 2001, 11:23:58 PM9/7/01
to
On Sat, 08 Sep 2001 01:52:05 GMT, Bulent Murtezaoglu <b...@acm.org> wrote:
>OB question in the same vein: which CL's handle 64bit address spaces?

ACL supports 64 bit addressing on HP-UX and Tru64.

--
Kevin Rosenberg, M.D.
ke...@rosenberg.net

0 new messages