You should not change the existing readtable, but create your own and change that - as I did in the example.
Then binding *readtable* to your readtable around calls to the read functions (read, read-from-string, ...) will make them use the new readtable - thanks to dynamic binding.
> And so the (. .) and (, ,) part is giving me trouble.
If you cannot change the data, then you will have to implement your own reader (or use a library reader), because unfortunately, the Common Lisp standard doesn't allow changing the constituent traits of characters, therefore nothing can be done to read the dot not as an invalid token by CL:READ. For the comma, you can just remove the reader macro normally attached to it. But for the dot, you need to change the whole reader.
(It would probably be much easier to copy your data file substituting these characters:
sed -e 's/\([.,]\)/\\\1/g' < daat > data-for-cl
and use data-for-cl with the normal readtable, but if you cannot change the dataset, perhaps you are forbidden to do that too...)
> a314658 <314...@gmail.com> writes: > > Er... I can't change the dataset. But it's just a tree of constants, > > no functions. It goes something like this:
> > And so the (. .) and (, ,) part is giving me trouble.
> If you cannot change the data, then you will have to implement your > own reader (or use a library reader), because unfortunately, the > Common Lisp standard doesn't allow changing the constituent traits of > characters, therefore nothing can be done to read the dot not as an > invalid token by CL:READ. For the comma, you can just remove the > reader macro normally attached to it. But for the dot, you need to > change the whole reader.
> (It would probably be much easier to copy your data file substituting > these characters:
> sed -e 's/\([.,]\)/\\\1/g' < daat > data-for-cl
> and use data-for-cl with the normal readtable, but if you cannot > change the dataset, perhaps you are forbidden to do that too...)
"jos...@corporate-world.lisp.de" <jos...@corporate-world.lisp.de> writes: > On 26 Jan., 01:07, p...@informatimago.com (Pascal J. Bourguignon) > wrote: >> a314658 <314...@gmail.com> writes: >> > Er... I can't change the dataset. But it's just a tree of constants, >> > no functions. It goes something like this:
>> > And so the (. .) and (, ,) part is giving me trouble.
>> If you cannot change the data, then you will have to implement your >> own reader (or use a library reader), because unfortunately, the >> Common Lisp standard doesn't allow changing the constituent traits of >> characters, therefore nothing can be done to read the dot not as an >> invalid token by CL:READ. For the comma, you can just remove the >> reader macro normally attached to it. But for the dot, you need to >> change the whole reader.
>> (It would probably be much easier to copy your data file substituting >> these characters:
>> sed -e 's/\([.,]\)/\\\1/g' < daat > data-for-cl
>> and use data-for-cl with the normal readtable, but if you cannot >> change the dataset, perhaps you are forbidden to do that too...)
jos...@corporate-world.lisp.de wrote: > On 26 Jan., 01:07, p...@informatimago.com (Pascal J. Bourguignon) > wrote: >> a314658 <314...@gmail.com> writes: >>> Er... I can't change the dataset. But it's just a tree of constants, >>> no functions. It goes something like this: >>> ( (S >>> (VP (VBD brought) >>> (PP >>> (NP (PRP him) )) >>> (NP >>> (NP (DT a) (NN mixture) ) >>> (, ,) >>> (PP (IN of)))) >>> (. .) )) >>> And so the (. .) and (, ,) part is giving me trouble. >> If you cannot change the data, then you will have to implement your >> own reader (or use a library reader), because unfortunately, the >> Common Lisp standard doesn't allow changing the constituent traits of >> characters, therefore nothing can be done to read the dot not as an >> invalid token by CL:READ. For the comma, you can just remove the >> reader macro normally attached to it. But for the dot, you need to >> change the whole reader.
>> (It would probably be much easier to copy your data file substituting >> these characters:
>> sed -e 's/\([.,]\)/\\\1/g' < daat > data-for-cl
>> and use data-for-cl with the normal readtable, but if you cannot >> change the dataset, perhaps you are forbidden to do that too...)
It works! I feared that I'd have to read the files a char at a time, or do pre-parsing, which would not be graceful. Your solution is graceful. Thanks a lot ;-)
> Discarding data is a very subtle business ;-) I'd be content with > translating (, ,) as (\, \,) and (. .) as (\. \.).
I.e. commas and periods are token constituent characters. Easily arranged in the readtable.
Any tricky cases? Could the data contain instances of the dot which are in fact consing dot notation and must be treated as such?
Is the lack of whitespace, in your examples, between the left or right parenthesis, and the following or leading dot, significant? Does ( . . ) mean something different from (. .) ?
Also, what is the lexical analysis to make of this?
> From: a314658 <314...@gmail.com> > One of the sexp's in a file that I'm trying to read has the following: > (foo (bar baz) (, ,) )
I don't know what you mean by "sexp", but that's not a valid s-expression in any version of lisp I've ever seen/used. You need to contact whoever made that file to find out what his/her intention was. Only when you know the intention of those commas will you be able to convert them to something usable in CL.
If this is a class assignment, ask your instructor. If this is a work assignment, ask your boss/supervisor. If your instructor/boss/supervisor refuses to tell you the intention of those commas, but still expects you to deal with them, file a grievance.
If this is a random file you found on the net, you're wasting your time, and our time. If this is a file you generated yourself, please tell us how you generated it.
http://rem.intarweb.org?tinyurl.com?u...@MaasInfo.Org (Robert Maas) writes: >> From: a314658 <314...@gmail.com> >> One of the sexp's in a file that I'm trying to read has the following: >> (foo (bar baz) (, ,) )
> I don't know what you mean by "sexp", but that's not a valid > s-expression in any version of lisp I've ever seen/used.
Old lisps had not yet backquotes, so you could name a symbol by a single comma. For the dot, I don't remember if in LISP 1.5 there was already the pair syntax or not, I'll have to check the sources.
> You need > to contact whoever made that file to find out what his/her > intention was. Only when you know the intention of those commas > will you be able to convert them to something usable in CL.
> If this is a class assignment, ask your instructor. > If this is a work assignment, ask your boss/supervisor. > If your instructor/boss/supervisor refuses to tell you the > intention of those commas, but still expects you to deal with them, > file a grievance.
Customer is king. You don't file grievances when a customer comes with his requirements, however strange they may seem to you.
> If this is a random file you found on the net, you're wasting your > time, and our time.
Not really, CL is perfectly able to read this file, as other answers demonstrated.
> If this is a file you generated yourself, please tell us how you > generated it.
>>> From: a314658 <314...@gmail.com> >>> One of the sexp's in a file that I'm trying to read has the following: >>> (foo (bar baz) (, ,) )
>> You need >> to contact whoever made that file to find out what his/her >> intention was. Only when you know the intention of those commas >> will you be able to convert them to something usable in CL.
From inspection, it's Penn treebank (parsed natural language). CARs are tags, CADRs are subtrees or leaves (lexemes), etc. The first "," thus is the class (tag) of all comma-like punctuation sequences, the second is the actual comma encountered. These files also often have bare : and ; and . characters, and may have the lexemes "'s" and "'nt" and don't forget the embedded comma as digits separator "1,000". I've also seen extended versions where the lexeme tokens may themselves include colons, weirdly escaped whitespace, potnums, etc.
I've read simple versions of these successfully with the stock CL reader by modifying the syntax of a few characters, but now I use Pascal Bourguingon's portable CL reader with a custom token recognizer.
-- Andrew Philpot USC Information Sciences Institute phil...@isi.edu