Tim Bradshaw <t...@tfeb.org> writes: > Lars Marius Garshol <lar...@ifi.uio.no> writes:
> > * Lars Marius Garshol > > | > > | I've stuffed the chars into a vector (using vector-push-extend) and > > | then used coerce to make a string of it.
> > * Kent M. Pitman > > | > > | Coerce can make a string from a list. You don't have to first convert > > | it to a vector. (coerce '(#\a #\b #\c) 'string)
> > I assumed the vector approach to be more effective (and made sure to > > allocate a reasonable initial-size vector to begin with). That is a > > reasonable assumption, no?
> No, it's a bad assumption. Your approach allocates a vector (why not > a string, in the first place?) and then makes a string with the same > contents (remember that COERCE does not modify its argument, it makes > a new object `corresponding to' its argument but with the right type). > Using COERCE directly will traverse the list (I guess) twice, but only > allocate a single string of the right length.
Well, making a string using the vector-push-extend approach is not all that difficult. In LW4.1,
EVENT-SEARCH 20 > (setq x (make-array 20 :element-type 'base-char :adjustable t :fill-pointer 0)) ""
EVENT-SEARCH 21 > (vector-push-extend #\a x) 0
EVENT-SEARCH 22 > (vector-push-extend #\b x) 1
EVENT-SEARCH 23 > (vector-push-extend #\c x) 2
EVENT-SEARCH 24 > x "abc"
EVENT-SEARCH 25 > (stringp x) T
The only difference when using another version of lisp is the type of character you use to make up the elements of the array. I believe in MCL, for instance, it's base-character.
* Lars Marius Garshol | | I assumed the vector approach to be more effective (and made sure to | allocate a reasonable initial-size vector to begin with). That is a | reasonable assumption, no?
* Tim Bradshaw | | No, it's a bad assumption. Your approach allocates a vector (why | not a string, in the first place?) and then makes a string with the | same contents (remember that COERCE does not modify its argument, it | makes a new object `corresponding to' its argument but with the | right type).
You seem to be correct about this. I didn't know this about COERCE, and I also wasn't aware of the possibility of directly modifiying strings like this. (The explanation of this perhaps surprising ignorance is that I'm a Lisp newbie. :)
Compilation of file /home/tosca/c1/larsga/privat/prog/clisp/meta/timer.lsp is finished. 0 errors, 0 warnings [larsga@pc-larsga meta]$ clisp timer Trying vector approach.
Real time: 0.470378 sec. Run time: 0.47 sec. Space: 1800072 Bytes GC: 3, GC time: 0.04 sec.
Trying string approach.
Real time: 0.179193 sec. Run time: 0.18 sec. Space: 400072 Bytes GC: 1, GC time: 0.01 sec. [larsga@pc-larsga meta]$ lisp timer.lsp ;;; *** Don't forget to edit /var/lib/cmucl/site-init.lisp! *** CMU Common Lisp 18a+ release x86-linux 2.4.7 6 November 1998 cvs, running on pc-larsga Send bug reports and questions to your local CMU CL maintainer, or to pvane...@debian.org or to cmucl-h...@cons.org. (prefered)
type (help) for help, (quit) to exit, and (demo) to see the demos
Loaded subsystems: Python 1.0, target Intel x86 CLOS based on PCL version: September 16 92 PCL (f) * (load "timer")
; Loading #p"/home/tosca/c1/larsga/privat/prog/clisp/meta/timer.lsp". Trying vector approach. Compiling LAMBDA NIL: Compiling Top-Level Form: [GC threshold exceeded with 2,003,432 bytes in use. Commencing GC.] [GC completed with 106,136 bytes retained and 1,897,296 bytes freed.] [GC will next occur when at least 2,106,136 bytes are in use.] Evaluation took: 1.9 seconds of real time 1.15 seconds of user run time 0.04 seconds of system run time [Run times include 0.09 seconds GC run time] 837 page faults and 2232128 bytes consed.
Trying string approach. Compiling LAMBDA NIL: Compiling Top-Level Form: Evaluation took: 0.02 seconds of real time 0.02 seconds of user run time 0.0 seconds of system run time 0 page faults and 399304 bytes consed. *
| Using COERCE directly will traverse the list (I guess) twice, but | only allocate a single string of the right length.
As I've pointed out before this is all done under the assumption that the initial character source is not a list.
* Sunil Mishra | | The only difference when using another version of lisp is the type | of character you use to make up the elements of the array. I believe | in MCL, for instance, it's base-character.
Is there a standardized way to do this? It really would be nice to have extensible strings for this, since in my case I'm doing this in an OMG IDL parser (which preferably shouldn't barf on long names).
Lars Marius Garshol wrote: > You now assume the characters initially came from a list. Mine come > from a character stream and the original poster wrote "assemble a > string from various chars...or from maybe a list of chars".
> In the case you assume I'd use coerce.
Right -- I assumed they came from a list. At the point I joined the thread KMP had said "COERCE can make a string from a list" and you'd said "I thought going via vectors was more efficient"; it was clear that KMP thought you were taking your characters from a list, and I sort of unconsciously assumed you were since you didn't object. :-)
-- Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics, gj...@dpmms.cam.ac.uk Cambridge University, England.
> * Sunil Mishra > | > | The only difference when using another version of lisp is the type > | of character you use to make up the elements of the array. I believe > | in MCL, for instance, it's base-character.
> Is there a standardized way to do this? It really would be nice to > have extensible strings for this, since in my case I'm doing this in > an OMG IDL parser (which preferably shouldn't barf on long names).
> --Lars M.
This *is* the standard way. You can get the element type of a string by typing:
> Is there a standardized way to do this? It really would be nice to > have extensible strings for this, since in my case I'm doing this in > an OMG IDL parser (which preferably shouldn't barf on long names).
Yes, this should be easy. You can use your approach of an adjustable array with a fill pointer, but make the element-type be whatever is right (I forget what is now for strings, but the hyperspec will say).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Not wanting to pick on you, but why haven't you done this? Should I automaticly fill in the "right" values at installation time?
Groetjes, Peter
-- It's logic Jim, but not as we know it. | pvane...@debian.org for pleasure, "God, root, what is difference?",Pitr | pvane...@inthan.be for more pleasure!
* Lars Marius Garshol | | [larsga@pc-larsga meta]$ lisp timer.lsp | ;;; *** Don't forget to edit /var/lib/cmucl/site-init.lisp! *** | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Peter Van Eynde | | Not wanting to pick on you, but why haven't you done this?
Simply because this is an installation I did on my computer at work just to check out CMU-CL. I did that a couple of days ago and so far haven't run CMU-CL more than 3-4 times, so I haven't bothered yet.
| Should I automaticly fill in the "right" values at installation | time?
On 18 Mar 1999 13:19:36 +0100, Lars Marius Garshol wrote:
>| Should I automaticly fill in the "right" values at installation >| time?
>Sorry, I don't understand this question.
You're supposed to fill in a bit of trivial data like the site-name. I was wondering if people would prefer the installation routine to ask for this when installing, rather then bugging people to edit this file.
Groetjes, Peter
-- It's logic Jim, but not as we know it. | pvane...@debian.org for pleasure, "God, root, what is difference?",Pitr | pvane...@inthan.be for more pleasure!
In article <slrn7f1uqg.70u.pvane...@mail.inthan.be>, Peter Van Eynde <pvane...@inthan.be> wrote:
>You're supposed to fill in a bit of trivial data like the site-name. >I was wondering if people would prefer the installation routine to >ask for this when installing, rather then bugging people to edit >this file.
This person wouldn't. I convert your packages to rpms with "alien"; rpm installation is traditionally a non-interactive thing.
You could make cmuclconfig sort it out, though. That's interactive anyway.
> On 18 Mar 1999 13:19:36 +0100, Lars Marius Garshol wrote: > >| Should I automaticly fill in the "right" values at installation > >| time?
> >Sorry, I don't understand this question.
> You're supposed to fill in a bit of trivial data like the site-name. > I was wondering if people would prefer the installation routine to > ask for this when installing, rather then bugging people to edit > this file.
IMHO, asking people to edit files at installation time is not a good thing.
Cheers
-- Marco Antoniotti =========================================== PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26 http://www.parades.rm.cnr.it/~marcoxa
Marco Antoniotti <marc...@copernico.parades.rm.cnr.it> writes: > pvane...@mail.inthan.be (Peter Van Eynde) writes:
> > You're supposed to fill in a bit of trivial data like the site-name. > > I was wondering if people would prefer the installation routine to > > ask for this when installing, rather then bugging people to edit > > this file.
> IMHO, asking people to edit files at installation time is not a good thing.
I agree with Marco.
Absolutely under no circumstances should one edit a file upon installation. The list of ways this can go wrong is large and varied. It's not to say I've never done it. But neither I nor anyone should ever feel they have followed normal, healthy engineering practice when they do. It's an ugly kludge that one should feel embarrassed about and should strive to eliminate.
* It is equivalent to editing source code for software reuse. It is the antithesis of modular programming. It is the worst of what C++ has to offer in the way of so-called oo design.
* This completely thwarts the ability to use md5 and other checksum tools to verify the integrity of an installed piece of foreign software.
* It risks that people who don't competently edit it will create problems they are not aware of and nightmares for people having to support them. It makes it impossible to say "I have foo 1.4" becuase you don't--you have 1.4 as modified by me. That's a new version. It needs a new name. And everyone will have different names.
* It risks the insertion of bogus or confidential information that will be retransmitted to other sites if someone retransmits the software installation to someone else instead of grabbing it fresh from the source.
* It isn't perspicuous. It's easy to overlook an edit you have to make.
* It forces people to be programmers continuing the worst trend of computer science which is to make everyone have to be computer-savvy just to do ordinary tasks rather than making computers be people-savvy.
All installation guides should ever say is just say "Press install." That's not to say that they shouldn't ask you questions, log what they did for inspection by those who want it, make the installation undoable, etc. But anything more than those two words is a bug in the doc and the system design.
I'm in a very grumpy mood today so unlike usual where I say there is too sides to everything and everything is just a trade-off, I'm going to just assert I'm right on this one.
Part of my bad mood may be related to having wasted two weeks learning about how to compile the linux kernel, poking around in network card device drivers, running all manner of configuration and diagnostic tools, etc. just trying to get linux to boot with two network cards. There isn't just a tool that does everything. One always just has to edit a single line of a file. It's just that there are 50,000 linux programmers and 50,000 single lines one might have to edit. There is simply no way that is the right model of anything.
* Peter Van Eynde | | You're supposed to fill in a bit of trivial data like the site-name. | I was wondering if people would prefer the installation routine to | ask for this when installing, rather then bugging people to edit | this file.
On Thu, 18 Mar 1999 17:54:04 GMT, Kent M Pitman wrote: >> IMHO, asking people to edit files at installation time is not a good thing.
>I agree with Marco.
Agreed, the new version will ask for these details.
> [... arguments against editing files ...]
I had written a long explanation of why debian makes certain choices, but this is distinctly off-charter. Let me just say that I/we try to solve this problem the best way we can, and that at least I know of no better system to date (I would be happy to be corrected) and that nearly all of your comments are either already solved or solutions are being discussed[1].
Groetjes, Peter
[1] It's a difficult road to walk _between_ point-and click dumming down and sendmail (old style, no m4) configuration...
-- It's logic Jim, but not as we know it. | pvane...@debian.org for pleasure, "God, root, what is difference?",Pitr | pvane...@inthan.be for more pleasure!
* Lars Marius Garshol <lar...@ifi.uio.no> | Is there a standardized way to do this? It really would be nice to have | extensible strings for this, since in my case I'm doing this in an OMG | IDL parser (which preferably shouldn't barf on long names).
from the description of the system class STRING in ANSI X3.226:
A string is a specialized vector whose elements are of type CHARACTER or a subtype of type CHARACTER. When used as a type specifier for object creation, STRING means (VECTOR CHARACTER).
so CHARACTER is already standard. there is no need to use BASE-CHAR, and no need to worry about portability problems.
in my view, however, the actual task at hand is to extract a subsequence of a stream's input buffer. I do this with a mark in the stream and avoid copying until absolutely necessary. this, however, requires access to and meddling with stream internals.
a less "internal" solution is to use WITH-OUTPUT-TO-STRING and simply write characters to it until the terminating condition is met, to wit:
(with-output-to-string (name) (stream-copy <input-stream> name <condition>)) => <string>
the function STREAM-COPY could be defined like this:
(defun stream-copy (input output &key (count -1) end-test filter transform) "Copy characters from INPUT to OUTPUT. COUNT is the maximum number of characters to copy. END-TEST if specified, causes termination when true for a character. FILTER if specified, causes only characters for which it is true to be copied. TRANSFORM if specified, causes its value to be copied instead of character." (loop (when (zerop count) (return)) (let ((character (read-char input nil :eof))) (when (eq :eof character) (return)) (when (and end-test (funcall end-test character)) (unread-char character input) (return)) (when (or (null filter) (funcall filter character)) (write-char (if transform (funcall transform character) character) output))) (decf count)))
Erik Naggum <e...@naggum.no> writes: > * Lars Marius Garshol <lar...@ifi.uio.no> > | Is there a standardized way to do this? It really would be nice to have > | extensible strings for this, since in my case I'm doing this in an OMG > | IDL parser (which preferably shouldn't barf on long names).
> from the description of the system class STRING in ANSI X3.226:
> A string is a specialized vector whose elements are of type CHARACTER or a > subtype of type CHARACTER. When used as a type specifier for object > creation, STRING means (VECTOR CHARACTER).
> so CHARACTER is already standard. there is no need to use BASE-CHAR, and > no need to worry about portability problems.
The first of the two uses of the word "need" here is odd. Strictly, there is no "need" to use Lisp, nor even to use computers. There is a "need" for food, clothing, and shelter. But if we extend "need" to sometimes mean "want" (which I assume you mean here), then there is sometimes a "need" to use BASE-CHAR because in some implementations CHARACTER may be inefficient (e.g., it might reserve a much more heavy-duty space capable of holding multi-byte characters), and it may sometimes be "necessary" to avoid this. The problem Lars might be perceiving, and I think it's a legit concern in certain limited contexts, is that you can't know when one "needs" to use BASE-CHAR to avoid overallocating space.
I think Erik is saying that one shouldn't pre-optimize something without first knowing the general case will be a problem. And what Lars is saying is that he perceives it will be a problem. This is something of a clash of absolutes and both have some merit. Mostly I think one should probably define Erik's approach to be the most conservative, even if not the one most people do. I, too, prefer to write general code first and get the shape and functionality right, and then to tune as needed where a problem is discovered. Since there is no a priori functional problem with CHARACTER, one should just use it until a problem is discovered. And then one might find one "needs" BASE-CHAR. But not otherwise.
Pre-optimizing the type specifier before knowing that CHARACTER leads to problems is mostly a bad idea. It needlessly increases program complexity at a time when you're just exploring what you want from your program. You may later throw away that line of code, and there's no sense in having optimized it. Or you may later find the code doesn't get enough play and doesn't need a declaration.
Life is short, and one isn't meant to spend it writing gratuitous declarations that don't actually do anything other than make code harder to write. At least, that's my own personal religious belief. (Apologies in advance to those in this multicultural forum if I've stepped on the toes of anyone whose religion teaches them that this IS a good way to spend one's life.)
Over-optimizing also encourages you to build fragile interfaces. For example, I ran into a bug in some Lisp implementation where the vendor had decided to use simple strings for symbol names. Maybe an ok assumption, but they didn't get it from the book, and they didn't fix INTERN and friends to coerce symbol names to be simple, so when you made a symbol with an adjustable string as a name, it let you do this, but this made a mess. (Note that the spec says "a string" not "a simple string" as the argument to INTERN. It doesn't forbid the implementation from copying the string to be simple or another element-type more suited to the characters it contains or whatever. But it says this fact shouldn't be revealed to users.) The particulars of the bug are not as relevant as the point about your responsibility when you narrow a type: your interface points must minimally check the incoming type and preferrably should coerce reasonable alternate types so that people don't do what I had to do while learning some Java a few weeks ago to cast an "integer represented as an object" back to an integer by something nutty like [if memory serves me right--my Java memory is very flaky]: ((Integer)someObject).getIntVal() having to do not one but two coercions just to say "this is in fact the integer it looks like". I don't want to get into a long discussion about Java or my inability to navigate it smoothly or the fact that this clumsiness doesn't overwhelm me with a desire to give up Lisp for it. My point here is simply this: type restriction has its place but it also has its cost. And you should try to avoid paying the cost because that cost can include infection of unrelated modules with needless paranoia. (For varying values of "need" again.) In the INTERN case above, the spec said it wasn't supposed to work the way the implementation did it, so it was easy for me to complain, but when you design your own interfaces your users won't have that luxury--they have to do what you implement. So make sure you're being sensitive to what's rational for them to use.
- - - -
If one does feel compelled to pre-optimize this, what I recommend doing is something akin to:
This is a kludge and isn't quite 100% right because theoretically the system could allocate a more restricted string representation for the constant string "foo" of known character composition than it would for base-char, but in practice I haven't observed implementations to do this and so the kludge is pretty portable.
Sometimes MAKE-STRING keeps you from doing this, but MAKE-STRING doesn't take all the arguments MAKE-ARRAY does so in practice I've had to do this in some cases.
(Sometimes you'll run into cases where the +base-char-type+ needs to be used in a not-for-evaluation situation and it may be helpful to use #.+base-char-type+ in that case. If you do this, an EVAL-WHEN around the DEFCONSTANT to make sure the variable is ready in the read-time environment may be needed. I left it out of the above just to keep things simple.)
Erik Naggum <e...@naggum.no> writes: > from the description of the system class STRING in ANSI X3.226:
> A string is a specialized vector whose elements are of type CHARACTER or a > subtype of type CHARACTER. When used as a type specifier for object > creation, STRING means (VECTOR CHARACTER).
> so CHARACTER is already standard. there is no need to use BASE-CHAR, and > no need to worry about portability problems.
Erik,
This is exactly what I had thought when I had first tried this in lispworks 3.2. Alas, what I got was
Hardly a string... Which had me pretty confused and led me to believe that 'character was not the right type to use. (I probably should have asked, but I was much more of a newbie back then than I am now, and this bit of information became a decontextualized fact over time.)
* Sunil Mishra <smis...@whizzy.cc.gatech.edu> | This is exactly what I had thought when I had first tried this in | lispworks 3.2. Alas, what I got was | | CL-USER 6 > (make-array 10 :element-type 'character) | #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL)
ouch. (I'm glad it has been fixed later.)
| Hardly a string... Which had me pretty confused and led me to believe that | 'character was not the right type to use. (I probably should have asked, | but I was much more of a newbie back then than I am now, and this bit of | information became a decontextualized fact over time.)
stuff like this is why I think programmers who don't read specifications learn bad habits: failure to get what you expect must be investigated and the culprit must actually be _found_: either you did something wrong, or somebody else did something wrong. "oh, that didn't work, let's try something else" is good if you deal with the physical world and people, but when you're dealing with computers and programming languages, it's the _last_ property of the physical world I want to imitate. if an expectation doesn't come true, either the expectation is wrong, you made a mistake in preparation for it, or there is a flaw in the system. if you don't do the work necessary to figure out which of these three is the right one, you have 1/3 chance of getting it right by luck. I think the most important desideratum for a programmer is an _unwillingness_ just to try something until it works -- a good programmer needs to know _why_.