> Solution depends a lot on how the words are separated.
Speaking of file formats, tab-delimited text is a very common one (where spaces are part of fields), and there are a bunch of others. What is the common practice here? It's very easy to quickly put together some code, I am just wondering if people prefer do this, or use some public interface libraries that maybe cover multiple formats such as .csv, .dbf or .wk1 in the spirit of reuse.
sd...@sdgs.com (Randy) writes: > Hi all, I'm having some trouble reading from a file. > The text file is in the form:
> Cat. > Bird. > Dog.
> I need to do the equivalent of (setq animals '(Cat. Bird. Dog.)) but I > need to read the elements of the list (however many there are in the > file) from the text file.
I'm assuming this is not a homework problem. (No one I know ever assigns anything useful like file I/O for homework. Sigh.)
Solution depends a lot on how the words are separated. READ reads lisp expressions, and "Cat.", etc. are technically lisp expressions. To retain case but still use READ, you have to use an appropriate readtable.
READ-LINE will read lines of text. The result is a string, which I would think would be better than a symbol. I can't really seriously believe you want symbols with dots in their names, but it is a possible thing. See the function INTERN if you want to convert a string to a symbol.
(defun read-the-file (filename) (with-open-file (stream filename) (loop for line = (read-line stream nil nil) while line collect line)))
will return ("Cat." "Bird." "Dog.") If instead you use (read stream nil nil), you'll get (CAT. BIRD. DOG.) If you use (intern (read-line stream nil nil)) you'll get (|Cat.| |Bird.| |Dog.|) You could also write your own reader to deal with custom separator chars and return value type. For example:
Then if you use (read-word stream) instead of the (read-line stream nil nil) you will end up able to have Cat. and Bird. and Dog. all on one line with only whitespace between. You also have better control over what happens if you do "Cat, Dog, etc." since "," is a character that Lisp doesn't want to see in places that English likes it to be.
Robert Monfera <monf...@fisec.com> writes: > Hi there,
> Kent M Pitman wrote: > ... > > Solution depends a lot on how the words are separated.
> Speaking of file formats, tab-delimited text is a very common one (where > spaces are part of fields), and there are a bunch of others. What is > the common practice here? It's very easy to quickly put together some > code, I am just wondering if people prefer do this, or use some public > interface libraries that maybe cover multiple formats such as .csv, .dbf > or .wk1 in the spirit of reuse.
I don't personally know of a library that does this, but there may be one. You could poke around at the ALU's interim web site. http://www.elwoodcorp.com/alu/
The thing is, though, it's so completely trivial to write that many people probably don't include a library just because finding the name of library name to use could take about as long as writing the 10 lines of code. I don't mean to attach a value judgment to that; I'm all for having shared libraries. But as a practical matter, people do resist writing them when the amount of work they save is relatively small.
* Robert Monfera wrote: > Speaking of file formats, tab-delimited text is a very common one (where > spaces are part of fields), and there are a bunch of others. What is > the common practice here? It's very easy to quickly put together some > code, I am just wondering if people prefer do this, or use some public > interface libraries that maybe cover multiple formats such as .csv, .dbf > or .wk1 in the spirit of reuse.
This is heresy of the worst kind, but when I have to do this I use the normal string-bashing tools -- some combination of awk, sed, perl and other normal Unix stuff -- to read the format and spit out something Lisp can read easily. That lets me do the interesting bit in Lisp and the boring bit in tools better suited to boring problems.
I'm reassured by the fact that people I know who do really serious data-mashing stuff in C *also* use this technique (perl for input processing basically).
>> Kent M Pitman wrote: >> ... >> > Solution depends a lot on how the words are separated.
>> Speaking of file formats, tab-delimited text is a very common one (where >> spaces are part of fields), and there are a bunch of others. What is >> the common practice here? It's very easy to quickly put together some >> code, I am just wondering if people prefer do this, or use some public >> interface libraries that maybe cover multiple formats such as .csv, .dbf >> or .wk1 in the spirit of reuse.
>I don't personally know of a library that does this, but there may >be one. You could poke around at the ALU's interim web site. > http://www.elwoodcorp.com/alu/
>The thing is, though, it's so completely trivial to write that many >people probably don't include a library just because finding the name >of library name to use could take about as long as writing the 10 lines >of code. I don't mean to attach a value judgment to that; I'm all for >having shared libraries. But as a practical matter, people do resist >writing them when the amount of work they save is relatively small.
You might find 'split-sequence' useful. The implementation given below was co-evolved in this newsgroup half a year ago:
"Return list of subsequences in SEQ delimited by DELIMITER. If an EMPTY-MARKER is supplied, empty subsequences will be represented by EMPTY-MARKER, otherwise they will be discarded. All other keywords work analogously to POSITION."
(unless end (setq end len))
(when from-end (setf seq (reverse seq)) (psetf start (- len end) end (- len start)))
(loop with other-keys = (nconc (when test-supplied (list :test test)) (when test-not-supplied (list :test-not test-not)) (when key-supplied (list :key key))) for left = start then (+ right 1) for right = (min (or (apply #'position delimiter seq :start left other-keys) len) end) if (< left right) collect (subseq seq left right) else when keep-empty-subseqs collect empty-marker until (eq right end)))
That way one could read in the complete file at once into a string (using READ-SEQUENCE) and do all the parsing in Lisp.
cheers, Bernhard -- -------------------------------------------------------------------------- Bernhard Pfahringer Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/ Artificial Intelligence bernh...@ai.univie.ac.at
Tim Bradshaw <t...@tfeb.org> writes: > * Robert Monfera wrote: > > Speaking of file formats, tab-delimited text is a very common one (where > > spaces are part of fields), and there are a bunch of others. What is > > the common practice here? It's very easy to quickly put together some > > code, I am just wondering if people prefer do this, or use some public > > interface libraries that maybe cover multiple formats such as .csv, .dbf > > or .wk1 in the spirit of reuse.
> This is heresy of the worst kind, but when I have to do this I use > the normal string-bashing tools -- some combination of awk, sed, perl > and other normal Unix stuff -- to read the format and spit out > something Lisp can read easily. That lets me do the interesting bit > in Lisp and the boring bit in tools better suited to boring problems.
For a while I was exchanging numerical data files a lot between Clasp (a Lisp stat package) and other applications, and I settled on the fairly useful hack of putting a list of numbers on every line, with tabs between all of the numbers *and* between the open paren and the first number and the last number and the close paren:
( 1 2 3 )
This let my Lisp program read things in normally, and just created a couple of garbage columns in other stat packages I was using.
> Is someone still maintaining it, correcting typos, etc.?
This is a popular question. The answer is a good deal more complicated than you probably expected. Here goes...
I am going to answer for what I know, but you should keep in mind that I don't speak for Harlequin, who claim the name Common Lisp HyperSpec as a trademark, and who own copyright in the hypertext markup. (The underlying text copyright ownership is an issue I'll speak to people about privately if they approach me about it, but I try not to comment about in public.) If your question is one of corporate policy of the document owner, you must ask Harlequin. Information by me below should be regarded as purely anecdotal, historical, trivia, and the like:
Even when I was at Harlequin, no one was "maintaining it and correcting typos" in the sense that you probably mean. That is, the typos are largely in the underlying ANSI CL spec, not in the hypertext layer of the document. (I was and am redirecting typos reported about CLHS as implicit requests for J13 to do something, but that's a separate matter.) It was/is important to the integrity of the document that the hypertext be precisely what is in the ANSI CL hardcopy. Once you fix typos, a divergence arises, and some such divergences could create material disputes over meaning. I and others wanted to avoid that where possible. True typos are things you can read past; if they are "typos that matter" one must be very wary of fixing them quietly. And historical documents are historical documents; one doesn't update spellings in the Declaration of Independence (or whatever your country's equivalent of that might be :-).
ANSI CL is still maintained through the ANSI process (NCITS committee J13, formerly known as X3J13). I and others will continue to be doing that, but that's a long-arc timeline between updates. A J13 meeting is coming up, though.
Back to CLHS, as I said, its status is something you could approach Harlequin to ask about, since that particular hypertextification item is copyrighted by them. There was some talk of having me continue to maintain it, but it was left in limbo for various reasons I'm going to try not to go into here. [Bottom line: if they want me to do it, they need to contact me and talk to me about the terms under which that might be done. They should not think they are waiting for me to contact them. If I were to decide to do something new, it would probably be to start over from the public TeX sources and write all-new code to do the conversion so that the result was mine to control and I didn't have to risk later having to again ask someone else's permission for the right to update something that came from the sweat of my own brow, as it were. I'm not necessarily likely to mount such an effort, especially absent funding to do so, but that would be what I would be inclined to do if I did get the urge, I guess is what I'm saying.]
At any rate, the virtue of CL qua language is its stability, so the fact that documents about it don't change regularly is not an automatic thing to panic about.
Little known CLHS versioning trivia:
Last I checked, the main version of CLHS that Harlequin distributes is version 4. Versions 1 and 2 were internal only; you never saw them unless you worked at Harlequin. Version 3 was the initial rollout; most people probably have that. You can find the version identifier in the HTML source code of every page. I recommend that you do NOT race to replace v3 with v4; the *only* change is a one-word legally required change in a trademark claim to claim "Liquid Common Lisp" instead of "Lucid Common Lisp". It's not worth downloading a whole new copy for that.
There is a version 5 in existence, though. It is different in substance in several ways: it contains 8.3 dos-style filenames, so probably works better on the Mac (there being 2 32-character-long filenames in Version 3 which exceed the 31-character Mac limit). Version 5 differs also in that it has some minor corrections to the HTML markup, and majorly better indexing of the format ops and sharpsign read macros. (The CLHS index is not part of the underlying X3J13 document, so is something I could update without deviating from the ANSI CL spec.) Version 5 also does not have the dorky little Java widget on the Symbol Index page that never worked right for me back when version 3 first issued (earlier versions of Netscape, and all that) and that finally got me fed up with Java enough to remove it in version 5. ("Write once, debug everywhere." I got tired of doing so.) In house, some fans of that widget complained, but their complaints fell on my deaf ears. Java might be stable enough to have put it back, but I never got around to doing that before I, uh, "left" Harlequin. Anyway, if you liked that Java widget as your customary interface, version 5 might seem like a bit of a downgrade. I'd always meant to make a v6 to fix that... Oh well.
[Free advice to Harlequin for what it's worth: Because so many people have by now probably bookmarked individual pages within CLHS (against my examples, btw; I have always stubbornly resisted posting individual pointers to pages, preferring instead to cite the main page and give English navigation instructions to the detail page in order to preserve the possibility of changing the internal URLs without invalidating a zillion DejaNews items), it would not be a good plan, in my personal opinion, for Harlequin to wholesale replace v3 with v5 on their web site without ALSO either (1) making a shadow directory containing HTML stubs for each of the old pages, redirecting people to each of the corresponding new pages, or (2) perhaps easier to do: telling the Harlequin web site server to specially redirect all references to books/Hyperspec to books/CLHS/Front/index.htm, which is the name of the cover page in the DOS/8.3 filenaming scheme that v5 uses. Absent such a compatibility plan, I'd recommend staying with v4 on the web site, but maybe that's just me.]
Incidentally, don't panic that v5 DOS/8.3 names are shorter--I went to enormous trouble to make them also be "predictable" in case there are people out there who like to think they know the algorithm for page naming and type it in raw; the 8.3 filenames are also fairly "predictable", after a fashion. That is, the algorithm, though different, it is intended to be learnable. Coming up with an invertible and human-readable algorithm for saying the chapter names to have 21.1 not get confused with 2.1.1 and still fit in 8 characters was fun. A sample is: CLHS/Body/21_aaaa.htm, which is 21.1.1.1.1 The use of alphabetics accomodates some section numbers that roll above 9 but fortunately don't get above 26.
Oh, and in answer to the big question some of you were probably wondering if I'd get to: To my knowledge, the only way you can get version 5, by the way, is to get a LispWorks. Though the free Personal Edition has it, so it's not like you have to pay dollars. It is not, to my knowledge, available as a separate item at their web site--but then, I haven't looked recently.
And, on balance, the pressure for CLHS to be THE source of hypertext lisp doc is less these days because Franz has an approximate equivalent of the hyperspec that it associates with its product as well. (I think one reason you don't hear as much about it is that they didn't give it a jazzy name--or a name at all that I can discern.) But it seems to have essentially the same underlying reference text. My impression is that it might have been produced from the last "draft" of the CL specification instead of the final version, but if so that's only a legal matter (which I'm going to try not to go into here because it's a rat's nest), not a technical one, since the technicalese in the last draft and the final version was identical.
One thing all this version stuff should tell you is that there's a tension in the world between "the need to fix typos" and "the need to upgrade". If typos were being fixed all the time, people would want to download copies all the time. And that would mean there would be a zillion subtly different versions all over the place. While at Harlequin, when I had a say in such things, I generally resisted making much noise about different versions because it seemed like a lot of effort for people to download a new version for remarkably little benefit. At some point, a new version will be needed, but I think for now the main issue is the care and feeding of the standard, not the care and feeding of its webification. And that's in the hands of a committee, not some single individual. But "web versioning" is still very much a great "unsolved problem". Coordinating updates to something depended on world-wide is tricky; ANSI has long made a whole business out of it.
bernh...@hummel.ai.univie.ac.at (Bernhard Pfahringer) writes: > You might find 'split-sequence' useful.
Certainly a useful function to have.
> That way one could read in the complete file at once into a string > (using READ-SEQUENCE) and do all the parsing in Lisp.
For bounded-size files. A serious virtue of the other approach is that it doesn't require you redundantly buffer the whole file's contents in memory. This exercise in parsing clearly requires a minimum of state on an ongoing basis, and while the solution you propose has that kind of APL feel of piping two powerful operators together to get a nice result, it's not the best way to teach a newbie how to make good engineering choices in a lot of practical settings. Even if the file size starts small, it might grow, and then people start to wonder what's taking up so much space. If the wrong person looks in to fixing it, not knowing there are alternatives, it can earn Lisp a bad name for appearing to "not having the good way to do things", and what was a hack for pleasant convenience can turn into a reason that someone at a certain shop thinks Lisp is never appropriate for serious use.
Things like split-sequence should be used where there is strong confidence that the dataset size is bounded. The mere mention of "file" makes me nervous in that regard. Most text editors make it painful enough to parse individual long lines that I'm pretty comfortable about split-sequence being used to split a "line" or a "token", but not a "file". Even though at an abstract level there is an unbroken continuum between tokens, lines, and files, and you can think of files as "mere tokens" conceptually, the practical fact is that there are subtle psychological shifts we make as we move from one datastructure to another, and I think when most people say "file", they mean "might have arbitrary length" and when m ost people say "line" they mean "probably has bounded length, usually less than 256." I feel pretty comfortable allocating (make-array 256 :element-type 'character :adjustable t :fill-pointer 0) for line buffers, for example, without worrying these will grow under normal use, and without worrying I have to re-adjust them back down in size periodically if they do grow. I feel a lot less sure of file buffers.
None of this really contradicts anything you said. I just worry for newbies (since that was what the subject line said was involved) who might be looking on and thinking this was the green light to not learn about conventional I/O tools, which are there and should be used sometimes.
And all just my personal opinion, of course. Other perspectives welcome.
<pit...@world.std.com> wrote: > Robert Monfera <monf...@fisec.com> writes:
>> Hi there,
>> Kent M Pitman wrote: >> ... >> > Solution depends a lot on how the words are separated.
>> Speaking of file formats, tab-delimited text is a very common one (where >> spaces are part of fields), and there are a bunch of others. What is >> the common practice here? It's very easy to quickly put together some >> code, I am just wondering if people prefer do this, or use some public >> interface libraries that maybe cover multiple formats such as .csv, .dbf >> or .wk1 in the spirit of reuse.
> I don't personally know of a library that does this, but there may > be one. You could poke around at the ALU's interim web site. > http://www.elwoodcorp.com/alu/
In article <sfwg15jtqv7....@world.std.com>, Kent M Pitman <pit...@world.std.com> wrote: (...)
> and when most people say "line" they mean "probably has bounded length, > usually less than 256." I feel pretty comfortable allocating > (make-array 256 :element-type 'character :adjustable t :fill-pointer 0) > for line buffers, for example, without worrying these will grow under > normal use, and without worrying I have to re-adjust them back down in > size periodically if they do grow.
I just want to make sure everybody notices the `mission-critical' arguments there, and those are ``:ADJUSTABLE T'' without which this code can cause much suffering. Dynamically sizing a line buffer is a must, otherwise 256 is *not* enough---even 1024 is not enough, as one version of vi that has this limit has shown me very eloquently. (Especially with all that software that passes for text editors that considers a paragraph to be a line---or should I say a line to be a paragraph. Unfortunately I do not belong to those happy few that can just reject such texts when they arrive by e-mail, for example.)
(...)
> And all just my personal opinion, of course. Other perspectives welcome.
I quite agree that `token' means `no problem buffering, cheap to throw around (e.g. copy),' `line' means `can be buffered with care and attention, may be expensive to throw around,' and `file' means `if you expect to be able to buffer it you deserve to lose when the time comes.'
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
* Kent M Pitman <pit...@world.std.com> | (defun peek-char-after-whitespace (stream) | (loop for ch = (read-char stream nil nil) | while ch | when (not (whitespace? stream)) | do (return ch)))
I'd've used (peek-char t stream nil nil) for this. have I read the specification too well, again? :)
In article <3134473642124...@naggum.no>, Erik Naggum <e...@naggum.no> wrote:
> * Kent M Pitman <pit...@world.std.com> > | (defun peek-char-after-whitespace (stream) > | (loop for ch = (read-char stream nil nil) > | while ch > | when (not (whitespace? stream)) > | do (return ch)))
> I'd've used (peek-char t stream nil nil) for this. have I read the > specification too well, again? :)
One difference whose importance depends on the particular problem is that with (PEEK-CHAR T ...) one does not have control over what exactly white space is.
Besides, since Kent Pitman's function above consumes the first non-white-space character, maybe peek-something is not a very appropriate name for it (unless READ-CHAR is replaced by PEEK-CHAR or a call to UNREAD-CHAR is added).
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Erik Naggum <e...@naggum.no> writes: > * Kent M Pitman <pit...@world.std.com> > | (defun peek-char-after-whitespace (stream) > | (loop for ch = (read-char stream nil nil) > | while ch > | when (not (whitespace? stream)) > | do (return ch)))
> I'd've used (peek-char t stream nil nil) for this. have I read the > specification too well, again? :)
I thought about this, and perhaps should have mentioned it. I'll leave it to you to decide. Basically, as a style thing, I only use (peek-char t stream nil nil) when I'm skipping "Lisp program text", not "user data". The reasons are these (some being "better" reasons than others):
* Using it means you have to be happy with the whitespace[2] definition of whitespace, even if you're content to restrict yourself purely to whitespace.
* Using it means you don't think about the fact that you might want to use other separators than whitespace (like ",") which might be useful in in application situations.
* Using it didn't give me a chance to illustrate character-level I/O.
* A long time ago, in Maclisp, TYIPEEK (which had an equivalent argument) had the following behavior which CL does not have, but I have lingering fear of (quoting from the Revised Maclisp Manual, a.k.a. Pitmanual):
If bits [the equivalent of the CL peek-type argument] is just T, TYIPEEK will skip over characters of input until the beginning of an S-expression is reached. Splicing macro characters, such as ``;'' comments, are not considered to begin an object. If one is encountered, its associated function is called as usual (so that the text of the comment can be gobbled up or whatever) and TYIPEEK continues scanning characters.
Unrelated trivia (no longer quoting): The way ``splicing macros'' worked is that they were readmacros whose results were "appended" to the input stream (sort of like ,@ in backquote). ; was a splicing macro that returned the empty list. My recolleciton is that it was possible in principle possible for splicing macros to return *several* things instead of zero things, but the T argument to TYIPEEK made a mess of things when this happened for reasons you can probably imagine. We fixed this misfeature in CL, explicitly requiring that splicing readmacros return only one or zero values.
* Even if you are happy with the way PEEK-CHAR does things, your code is still at risk that the language standard will change and the "space" (so to speak) of things (PEEK-CHAR T ...) skips will change. This is true of any function, of course, but I regard it as more true of functions that add arbitrary and marginal functionality such as PEEK-CHAR does here. Frankly, I'd be pleased as could be (other than the compatibility nightmare it would cause to make it happen, so I would never vote for it) if this peek-type argument just disappeared and PEEK-CHAR had the same argument signature as READ-CHAR. I get hit by this all the time in new code.
Vassil Nikolov <vniko...@poboxes.com> writes: > Besides, since Kent Pitman's function above consumes the first > non-white-space character, maybe peek-something is not a very > appropriate name
Absolutely right. Now that I think about it, I guess I usually call this one READ-CHAR-AFTER-WHITESPACE.
* Vassil Nikolov <vniko...@poboxes.com> | One difference whose importance depends on the particular problem is that | with (PEEK-CHAR T ...) one does not have control over what exactly white | space is.
really? I control the meaning of whitespace by modifying the readtable in an application. e.g., I have made all control characters into whitespace _except_ newline, which is an important delimiter in my data stream. in what way does this not work? </tongue-in-cheek>
Erik Naggum <e...@naggum.no> writes: > * Vassil Nikolov <vniko...@poboxes.com> > | One difference whose importance depends on the particular problem is that > | with (PEEK-CHAR T ...) one does not have control over what exactly white > | space is.
> really? I control the meaning of whitespace by modifying the readtable > in an application. e.g., I have made all control characters into > whitespace _except_ newline, which is an important delimiter in my data > stream. in what way does this not work? </tongue-in-cheek>
The above text is either mine or vey like something I said. It means you can't change the meaning of whitespace independent of what READ wants. [Except by *readtable*, of course. A big hammer. Code starts to look not much simpler than writing your own parser, which IMO is more perspicuous. Don't get me wrong--I've done some pretty crazy things with readtables in my time, including my own versions of the hack I think it was Vaughan Pratt (author of CGOL, the Maclisp-based infix syntax for lisp) used originally where you make every character have the same readtable entry and make that readtable entry launch a custom parser, so you can call (READ) to read fortran programs and the like. READ can be perverted to do some interesting things. But, I guess as a function of my advancing age, and the inevitable spoilsport-like attitude that eventually takes over us oldsters making us no fun to talk to any more, I've come a bit more to the conclusion that some things that are POSSIBLE are nevertheless still not the best way to do things.]
* Kent M Pitman <pit...@world.std.com> | The above text is either mine or vey like something I said. It means you | can't change the meaning of whitespace independent of what READ wants.
that's ok, because I use this _with_ READ, to read ordinary Common Lisp forms, except that a bunch of features have been disabled, whitespacitude has been relaxed, and the newline is a terminating macro character.
building my own reader may have been as much work, but starting to build my own reader would have been a lot more work, and it would have been a lot of duplicative effort, anyway. as I have gained experience from usage, I have come to exclude various stuff from that readtable, but I would still do it the same way all over again, because I really don't have the time to write the low-level stuff in a reader. the C complement to my protocol is mostly reader-related, and all it does is attach a type character to the front of a string, and all objects are represented as strings, re-parsed upon demand. call it a cop-out, but there's a lot of hairy stuff that READ does that is too detailed and low-level to work out anew, without effectively designing your own syntax, and _that's_ just plain evil.
| But, I guess as a function of my advancing age, and the inevitable | spoilsport-like attitude that eventually takes over us oldsters making us | no fun to talk to any more, I've come a bit more to the conclusion that | some things that are POSSIBLE are nevertheless still not the best way to | do things.
oh, stop it! we youngsters like you just the way you are, Kent.
In article <3134571515815...@naggum.no>, Erik Naggum <e...@naggum.no> wrote: (...)
> really? I control the meaning of whitespace by modifying the readtable > in an application. e.g., I have made all control characters into > whitespace _except_ newline, which is an important delimiter in my data > stream. in what way does this not work? </tongue-in-cheek>
(I admit that I didn't recognise the above as Kent Pitman's text, but now the mystery of that SGML tag has been resolved.)
I just want to add a few things to what has already been posted.
Perhaps it would be nice if the fact that the meaning of whitespace[2] depends on the current readtable and not on the standard readtable was made a little more explicit in the spec (indeed, *READTABLE* is listed in the `Affected By' section of PEEK-CHAR's description, but perhaps something could be mentioned either in the glossary or in Section 2.1.4.7 (Whitespace Characters) too).
Also, note the idiom for setting the whitespace attribute of a readtable entry: (SET-SYNTAX-FROM-CHAR c #\Space). (It's an idiom to me because SET-SYNTAX-FROM-CHAR copies everything, not just the whitespace attribute.)
Also, perhaps a standard function (WHITESPACEP &OPTIONAL readtable) that tests this attribute could be useful (I know a kludge that does this, but it is rather inelegant).
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
* Erik Naggum <e...@naggum.no> | really? I control the meaning of whitespace by modifying the readtable | in an application. e.g., I have made all control characters into | whitespace _except_ newline, which is an important delimiter in my data | stream. in what way does this not work? </tongue-in-cheek>
* Vassil Nikolov <vniko...@poboxes.com> | (I admit that I didn't recognise the above as Kent Pitman's text, but now | the mystery of that SGML tag has been resolved.)
huh? there appears to be some attribution problems here. I hope I'm not morphing into Kent Pitman, or vice versa -- I like us separate.
| Perhaps it would be nice if the fact that the meaning of whitespace[2] | depends on the current readtable and not on the standard readtable was | made a little more explicit in the spec (indeed, *READTABLE* is listed in | the `Affected By' section of PEEK-CHAR's description, but perhaps | something could be mentioned either in the glossary or in Section 2.1.4.7 | (Whitespace Characters) too).
I find this to be sufficiently basic to the language design that it would actually be confusing to add it in particular places -- one would have to wonder why it was added.
| Also, perhaps a standard function (WHITESPACEP &OPTIONAL readtable) that | tests this attribute could be useful (I know a kludge that does this, but | it is rather inelegant).
your implementation may sport just such functions, or they might be macros or accessors inlined so strongly that they are not retained in the dumped image.
On Thu, 29 Apr 1999 14:47:17 GMT, Kent M Pitman <pit...@world.std.com> wrote:
>barranqu...@laley-actualidad.es (Juanma Barranquero) writes: >That is, the typos are largely in the underlying ANSI CL spec, not >in the hypertext layer of the document. (I was and am redirecting >typos reported about CLHS as implicit requests for J13 to do >something, but that's a separate matter.) It was/is important to >the integrity of the document that the hypertext be precisely what >is in the ANSI CL hardcopy. Once you fix typos, a divergence >arises, and some such divergences could create material disputes >over meaning. I and others wanted to avoid that where possible.
While I understand that, and I supposed as much, I still feel it would be nice to correct obvious typos, perhaps adding a note with the full original text (and a *big* disclaimer stating that, should a problem with interpretation arise, the reader would be better advised to go read the ANSI CL spec, of course :)
But in fact I didn't meant typos, but things like the alignment errors in the Permuted Symbol Index (where things like "pathname-name" are aligned around the first appearance of the index letter, not the first of the corresponding word), specially notorious in the "P".
>And historical documents are historical documents; one doesn't update >spellings in the Declaration of Independence (or whatever >your country's equivalent of that might be :-).
The "Constitución de Cádiz de 1812" would be a fine example. OTOH, people (scholars, I mean) *do* correct typos in the Quixote, Shakespeare texts, etc. :) But I'm just joking, I understand the rationale pretty well.
>I'm not necessarily likely to mount such an effort, especially >absent funding to do so, but that would be what I would be >inclined to do if I did get the urge, I guess is what I'm saying.]
Well, if you feel that urge sometimes (and sitting down for a while doesn't make it to pass :) it wouldn't be difficult to find a few of us happy to help, I'd say.
>At any rate, the virtue of CL qua language is its stability, so the >fact that documents about it don't change regularly is not an >automatic thing to panic about.
Yes, of course.
[Going backwards in time...]
>This is a popular question. The answer is a good deal more >complicated than you probably expected. Here goes...
Thanks a lot for taking the time to answer so thoroughly.
> But in fact I didn't meant typos, but things like the alignment errors > in the Permuted Symbol Index (where things like "pathname-name" are > aligned around the first appearance of the index letter, not the first > of the corresponding word), specially notorious in the "P".
Thanks for the tip-off. I imagine this could be easily fixed in a future revision.
While you are welcome to post such issues here, mail to clhs-b...@harlequin.com would be more direct.
In article <3134756936126...@naggum.no>, Erik Naggum <e...@naggum.no> wrote:
> * Erik Naggum <e...@naggum.no> > | really? I control the meaning of whitespace by modifying the readtable > | in an application. e.g., I have made all control characters into > | whitespace _except_ newline, which is an important delimiter in my data > | stream. in what way does this not work? </tongue-in-cheek>
> * Vassil Nikolov <vniko...@poboxes.com> > | (I admit that I didn't recognise the above as Kent Pitman's text, but now > | the mystery of that SGML tag has been resolved.)
> huh? there appears to be some attribution problems here. I hope I'm not > morphing into Kent Pitman, or vice versa -- I like us separate.
:-) (Let me assure you that I perceive both of you as very distinct from one another.)
What I had in mind was the last line of the excerpt that follows:
From: Kent M Pitman <pit...@world.std.com> Subject: Re: Newbie Help Please: Reading into a list from a file Date: 1999/05/02 Message-ID: <sfw7lqs70yg....@world.std.com> Newsgroups: comp.lang.lisp References: <37266b90.3567...@news3.newscene.com> <3134473642124...@naggum.no> <7gcnek$29...@nnrp1.dejanews.com> <3134571515815...@naggum.no>
Erik Naggum <e...@naggum.no> writes: (...) > really? I control the meaning of whitespace by modifying the readtable > in an application. e.g., I have made all control characters into > whitespace _except_ newline, which is an important delimiter in my data > stream. in what way does this not work? </tongue-in-cheek>
The above text is either mine or vey like something I said.
(end of excerpt).
> | Perhaps it would be nice if the fact that the meaning of whitespace[2] > | depends on the current readtable and not on the standard readtable was > | made a little more explicit in the spec (...)
> I find this to be sufficiently basic to the language design that it would > actually be confusing to add it in particular places -- one would have to > wonder why it was added.
Perhaps you are right, and this is a matter for a commentary on the standard.
> | Also, perhaps a standard function (WHITESPACEP &OPTIONAL readtable) that > | tests this attribute could be useful (I know a kludge that does this, but > | it is rather inelegant).
> your implementation may sport just such functions, or they might be > macros or accessors inlined so strongly that they are not retained in the > dumped image.
If implementations provide such a function, why not include it in the standard?
(If they don't provide an appropriate readtable accessor, then the user can do---as far as my ingenuity goes---only something along the following line: convert the character into a string and call READ-FROM-STRING. If the eof-value is returned, then the character is whitespace in the current readtable, otherwise an object would be returned or an error would be signalled. Pretty kludgy to me.)
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own