Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Newbie: input file with commas like this: (, ,)

339 views
Skip to first unread message

a314658

unread,
Jan 25, 2009, 9:12:06 PM1/25/09
to
One of the sexp's in a file that I'm trying to read has the following:

(foo (bar baz) (, ,) )

and when I try to read it in I get error: comma is illegal outside of
backquote. what is a way around that?

thanks,
-Victor Piousbox, a student

D Herring

unread,
Jan 25, 2009, 6:14:31 PM1/25/09
to
a314658 wrote:
> One of the sexp's in a file that I'm trying to read has the following:
>
> (foo (bar baz) (, ,) )
>
> and when I try to read it in I get error: comma is illegal outside of
> backquote. what is a way around that?

What does foo expect the commas to be?
(foo (bar baz) (", ,"))
(foo (bar baz) (#\, #\,))
(foo (bar baz) (|, ,|))
...

- Daniel

a314658

unread,
Jan 25, 2009, 9:23:10 PM1/25/09
to
Er... I can't change the dataset. But it's just a tree of constants, no
functions. It goes something like this:

( (S
(VP (VBD brought)
(PP
(NP (PRP him) ))
(NP
(NP (DT a) (NN mixture) )
(, ,)
(PP (IN of))))
(. .) ))

And so the (. .) and (, ,) part is giving me trouble.

Rainer Joswig

unread,
Jan 25, 2009, 6:57:04 PM1/25/09
to
In article <497cf18e$0$3340$6e1e...@read.cnntp.org>,
a314658 <314...@gmail.com> wrote:

See readtables in Common Lisp.

To make , a whitespace use:

CL-USER 60 > (defparameter *data* "(foo (bar baz) (, ,) )")
*DATA*

CL-USER 61 > (defparameter *rt* (copy-readtable))
*RT*

CL-USER 62 > (set-syntax-from-char #\, #\space *rt* *rt*)
T

CL-USER 63 > (let ((*readtable* *rt*)) (read-from-string *data*))
(FOO (BAR BAZ) NIL)
22

You can also have them read as something.
Here I read the character #\, as the symbol |,|.
The symbol comma can also be printed as \, .

CL-USER 64 > (defparameter *rt* (copy-readtable))
*RT*

CL-USER 65 > (set-macro-character #\, (lambda (stream char) '|,|) nil *rt*)
T

CL-USER 66 > (let ((*readtable* *rt*)) (read-from-string *data*))
(FOO (BAR BAZ) (\, \,))
22


You should not change the existing readtable, but create your
own and change that - as I did in the example.

Then binding *readtable* to your readtable
around calls to the read functions (read, read-from-string, ...)
will make them use the new readtable - thanks to dynamic
binding.

--
http://lispm.dyndns.org/

Pascal J. Bourguignon

unread,
Jan 25, 2009, 7:07:21 PM1/25/09
to
a314658 <314...@gmail.com> writes:

> Er... I can't change the dataset. But it's just a tree of constants,
> no functions. It goes something like this:
>
> ( (S
> (VP (VBD brought)
> (PP
> (NP (PRP him) ))
> (NP
> (NP (DT a) (NN mixture) )
> (, ,)
> (PP (IN of))))
> (. .) ))
>
> And so the (. .) and (, ,) part is giving me trouble.

If you cannot change the data, then you will have to implement your
own reader (or use a library reader), because unfortunately, the
Common Lisp standard doesn't allow changing the constituent traits of
characters, therefore nothing can be done to read the dot not as an
invalid token by CL:READ. For the comma, you can just remove the
reader macro normally attached to it. But for the dot, you need to
change the whole reader.

(It would probably be much easier to copy your data file substituting
these characters:

sed -e 's/\([.,]\)/\\\1/g' < daat > data-for-cl

and use data-for-cl with the normal readtable, but if you cannot
change the dataset, perhaps you are forbidden to do that too...)


Have a look at:
http://darcs.informatimago.com/lisp/common-lisp/reader.lisp
http://www.informatimago.com/develop/lisp/index.html

--
__Pascal Bourguignon__

jos...@corporate-world.lisp.de

unread,
Jan 25, 2009, 7:24:58 PM1/25/09
to
On 26 Jan., 01:07, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

(defparameter *data* "(foo (bar baz) (, ,) (. .))")
(defparameter *rt* (copy-readtable))

(set-macro-character #\. (lambda (stream char) '|.|) nil *rt*)


(set-macro-character #\, (lambda (stream char) '|,|) nil *rt*)

(set-macro-character #\( (lambda (stream char) (read-delimited-list #
\) stream)) t *rt*)

(let ((*readtable* *rt*))
(with-input-from-string (s *data*)
(read s)))

-> (FOO (BAR BAZ) (\, \,) (\. \.))

Works for me...

Pascal J. Bourguignon

unread,
Jan 25, 2009, 8:44:35 PM1/25/09
to
"jos...@corporate-world.lisp.de" <jos...@corporate-world.lisp.de> writes:

Yes, it works here too. I must have done something wrong in my own tests...

--
__Pascal Bourguignon__

Kaz Kylheku

unread,
Jan 25, 2009, 9:01:13 PM1/25/09
to
On 2009-01-26, a314658 <314...@gmail.com> wrote:
> Er... I can't change the dataset. But it's just a tree of constants, no
> functions. It goes something like this:
>
> ( (S
> (VP (VBD brought)
> (PP
> (NP (PRP him) ))
> (NP
> (NP (DT a) (NN mixture) )
> (, ,)
> (PP (IN of))))
> (. .) ))
>
> And so the (. .) and (, ,) part is giving me trouble.

What D Herring is asking is what these commas and periods represent.

You can't solve this problem if you don't know what these are.

Is it okay to just discard these forms, so they turn into nothing?

If they mean nothing, why are they in the data?


Let's put it this way: can you fill in this blank?

Syntax: Reads as this Lisp object:

a A
(a . nil) (A)
#xFF 255
(, ,) ______ ???


If /you/ can't fill in this blank, how can you write a computer
program which fills in the blank?

a314658

unread,
Jan 26, 2009, 9:48:43 PM1/26/09
to

It works! I feared that I'd have to read the files a char at a time,
or do pre-parsing, which would not be graceful. Your solution is
graceful. Thanks a lot ;-)

-Victor

a314658

unread,
Jan 26, 2009, 9:54:40 PM1/26/09
to

Discarding data is a very subtle business ;-) I'd be content with
translating (, ,) as (\, \,) and (. .) as (\. \.).

Kaz Kylheku

unread,
Jan 26, 2009, 7:27:17 PM1/26/09
to
On 2009-01-27, a314658 <314...@gmail.com> wrote:
> Discarding data is a very subtle business ;-) I'd be content with
> translating (, ,) as (\, \,) and (. .) as (\. \.).

I.e. commas and periods are token constituent characters. Easily arranged in
the readtable.

Any tricky cases? Could the data contain instances of the dot which are in fact
consing dot notation and must be treated as such?

Is the lack of whitespace, in your examples, between the left or right
parenthesis, and the following or leading dot, significant? Does ( . . ) mean
something different from (. .) ?

Also, what is the lexical analysis to make of this?

(. .)
\ /
/ \
( v )

Hmm ....

Robert Maas

unread,
Jan 28, 2009, 9:19:46 AM1/28/09
to
> From: a314658 <314...@gmail.com>

> One of the sexp's in a file that I'm trying to read has the following:
> (foo (bar baz) (, ,) )

I don't know what you mean by "sexp", but that's not a valid
s-expression in any version of lisp I've ever seen/used. You need
to contact whoever made that file to find out what his/her
intention was. Only when you know the intention of those commas
will you be able to convert them to something usable in CL.

If this is a class assignment, ask your instructor.
If this is a work assignment, ask your boss/supervisor.
If your instructor/boss/supervisor refuses to tell you the
intention of those commas, but still expects you to deal with them,
file a grievance.

If this is a random file you found on the net, you're wasting your
time, and our time.
If this is a file you generated yourself, please tell us how you
generated it.

Pascal J. Bourguignon

unread,
Jan 28, 2009, 10:13:53 AM1/28/09
to

>> From: a314658 <314...@gmail.com>
>> One of the sexp's in a file that I'm trying to read has the following:
>> (foo (bar baz) (, ,) )
>
> I don't know what you mean by "sexp", but that's not a valid
> s-expression in any version of lisp I've ever seen/used.

Old lisps had not yet backquotes, so you could name a symbol by a
single comma. For the dot, I don't remember if in LISP 1.5 there was
already the pair syntax or not, I'll have to check the sources.


> You need
> to contact whoever made that file to find out what his/her
> intention was. Only when you know the intention of those commas
> will you be able to convert them to something usable in CL.
>
> If this is a class assignment, ask your instructor.
> If this is a work assignment, ask your boss/supervisor.
> If your instructor/boss/supervisor refuses to tell you the
> intention of those commas, but still expects you to deal with them,
> file a grievance.


Customer is king. You don't file grievances when a customer comes
with his requirements, however strange they may seem to you.


> If this is a random file you found on the net, you're wasting your
> time, and our time.

Not really, CL is perfectly able to read this file, as other answers
demonstrated.


> If this is a file you generated yourself, please tell us how you
> generated it.

--
__Pascal Bourguignon__

Andrew Philpot

unread,
Jan 28, 2009, 10:58:24 AM1/28/09
to
> http://rem.intarweb.org?tinyurl.com?uh...@MaasInfo.Org (Robert Maas) writes:
>
>>> From: a314658 <314...@gmail.com>
>>> One of the sexp's in a file that I'm trying to read has the following:
>>> (foo (bar baz) (, ,) )
>>
>> You need
>> to contact whoever made that file to find out what his/her
>> intention was. Only when you know the intention of those commas
>> will you be able to convert them to something usable in CL.

From inspection, it's Penn treebank (parsed natural language). CARs
are tags, CADRs are subtrees or leaves (lexemes), etc. The first ","
thus is the class (tag) of all comma-like punctuation sequences, the
second is the actual comma encountered. These files also often have
bare : and ; and . characters, and may have the lexemes "'s" and "'nt"
and don't forget the embedded comma as digits separator "1,000". I've
also seen extended versions where the lexeme tokens may themselves
include colons, weirdly escaped whitespace, potnums, etc.

I've read simple versions of these successfully with the stock CL
reader by modifying the syntax of a few characters, but now I use
Pascal Bourguingon's portable CL reader with a custom token
recognizer.

--
Andrew Philpot
USC Information Sciences Institute
phi...@isi.edu

0 new messages