parsing lisp expression

Xah

unread,

Dec 19, 1997, 3:00:00 AM12/19/97

to

Is it feasible to use Perl to parse Lisp expression?

For simplicity, let's say all atoms are just alphanumerics. Here is a sample
expression,

((f 1 19) f2 (g h (2 5 9) 3))

I want to bring out any of such expression's head outside the parenthesis
and then change the paren to brackets, e.g. (f 1 2 3) becomes f[1 2 3]. The
above sample becomes

f[1 19][f2 g[h 2[5 9] 3]]

and I also want to do the reverse operation. I'm wondering if it's feasible,
say, by a Perl guru in a week? If not, is there in general a tool which I
can do this?

I don't know any parser or compiler theory. Is it possible for me to learn
something like lex/yacc to be able to do what I want? All I need is to
transform the *string*. For the curious, I am doing this to explore tree
forms between Scheme and Mathematica.

Thanks in advance for any help.

Xah, x...@best.com
http://www.best.com/~xah/Wallpaper_dir/c0_WallPaper.html
Mountain View, CA, USA
"Perl my ass" (there are more than one way to express love)

Stig Hemmer

unread,

Dec 19, 1997, 3:00:00 AM12/19/97

to

"Xah" <x...@best.com> writes:
[wants to convert both ways between]

> ((f 1 19) f2 (g h (2 5 9) 3))

[and]

> f[1 19][f2 g[h 2[5 9] 3]]

> I'm wondering if it's feasible, say, by a Perl guru in a week?

While I'm not a Perl guru, I would think this is much less than a
weeks work.

> For the curious, I am doing this to explore tree forms between
> Scheme and Mathematica.

My naive guess is that it would be simpler is convince Scheme to read
and write Mathematica forms rather than going via Perl.

E.g.

(define (mm-print obj)
(cond ((null? obj) nil)
((symbol? obj) (write obj))
(#t (begin (mm-print (car obj))
(write '[)
(for-each mm-print (cdr obj))
(write '])))))

I don't have a scheme instance handy for testing, so this probably buggy.
The idea was to show a scheme list in Mathematica form.

The corresponding mm-read would be more difficult (at least for me :)
It might be easier to write a scheme-print in Mathematica.

If you want more help, I suggest the groups comp.lang.scheme,
comp.soft-sys.math.mathematica or comp.lang.perl depending on your
question.

Stig Hemmer.

Tim Bradshaw

unread,

Dec 19, 1997, 3:00:00 AM12/19/97

to

* Xah wrote:
> Is it feasible to use Perl to parse Lisp expression?
> For simplicity, let's say all atoms are just alphanumerics. Here is a sample
> expression,

> ((f 1 19) f2 (g h (2 5 9) 3))

> I want to bring out any of such expression's head outside the parenthesis

> and then change the paren to brackets, e.g. (f 1 2 3) becomes f[1 2 3]. The
> above sample becomes

> f[1 19][f2 g[h 2[5 9] 3]]

Doing this direction in Lisp is pretty easy:

(defgeneric dragout (form stream))

(defmethod dragout ((form list) stream)
(dragout (car form) stream)
(princ "[" stream)
(mapl #'(lambda (tail)
(dragout (car tail) stream)
(when (cdr tail)
(princ " " stream)))
(cdr form))
(princ "]" stream)
form)

(defmethod dragout ((form t) stream)
(prin1 form stream)
form)

Doing the reverse shouldn't be much harder I think (you might have toi
hack readtables), so I see no need for perl.

--tim

Cedric Adjih

unread,

Dec 19, 1997, 3:00:00 AM12/19/97

to

Xah <x...@best.com> wrote:
: Is it feasible to use Perl to parse Lisp expression?
:
: For simplicity, let's say all atoms are just alphanumerics. Here is a sample
: expression,
:
: ((f 1 19) f2 (g h (2 5 9) 3))
:
: I want to bring out any of such expression's head outside the parenthesis
: and then change the paren to brackets, e.g. (f 1 2 3) becomes f[1 2 3]. The
: above sample becomes
:
: f[1 19][f2 g[h 2[5 9] 3]]

:
: and I also want to do the reverse operation. I'm wondering if it's feasible,
: say, by a Perl guru in a week? If not, is there in general a tool which I
: can do this?

Actually, as a test, I have written the program at the end of the post in
Python, and its seems to work (for me :-)):

johan:~ $ python parse-lisp.py
scheme=( (f 1 19) f2 (g h (2 5 9)3))
canonical=((f 1 19) f2 (g h (2 5 9) 3))
as mathematica=f[1 19][f2 g[h 2[5 9] 3]]
back as scheme=((f 1 19) f2 (g h (2 5 9) 3))
consistency=yes

I'm sorry, I'm not fluent enough in Perl or Scheme to translate it.
The hard part isn't the parsing actually, but the algorithm of the
translation from Mathematica to Scheme:

mathematica=f[1][2][3][4][5][6][7][8]
should be translated in scheme to ((((((((f 1) 2) 3) 4) 5) 6) 7) 8)
(?if I have correctly understood the convertion)
so you'd basically to take in account characters up to
mathematica=f[1][2][3][4][5][6][7][8]
^(here)
in order to know at which level of parenthesis f should be.

I did this:
- change the mathematica expression into a tree
(which would be a sexp in scheme)
mathematica string "f[1 19][f2 g[h 2[5 9] 3]]" is parsed as:
=> (f (1 19) (f2 g (h 2 (5 9) 3))) {if the program was in scheme}
["f",["1","19"],["f2","g",[,"h","2",["5","9"],"3"]]] {actually in Python}
- reverse the tree:
(((3 (9 5) 2 h) g f2) (19 1) f)
- recursively correct the tree:
when operating on a list l:
. correct the tail of the list (cdr l)
. if (car l) is a list, move the first element of the corrected tail to
(car l) ; rebuild the list
i.e. ((3) 4 ..... ) -> ((3 4) ..... )
- Note that ((3) (4) f) -> ((3 (4 f))) because ((4) f) becomes ((4 f)) first

=> at the end: (((3 (9 5 2) h g) f2 (19 1 f)))
- when finished, reverse again the tree
(((f 1 19) f2 (g h (2 5 9) 3)))

Probably there are a better algorithms...

: I don't know any parser or compiler theory. Is it possible for me to learn

: something like lex/yacc to be able to do what I want? All I need is to

: transform the *string*. For the curious, I am doing this to explore tree

: forms between Scheme and Mathematica.

The best language to write this is CL or Scheme [apart from Python
of course :-)], because tree walking is inherently recursive.
It is quite possible that someone could write a 10 lines
super-clever algorithm/implementation maybe with CPS.

The python code is probably translatable nearly one-to-one to Perl 5,
but the result might look a bit scary.

Lex + C wouldn't be fun to program.

An alternative is to program the parsing mathematica->scheme forms
in any language, and to do the correction in Lisp.

################ cut here

#! /usr/bin/env python

# {remark: tree[0] means (car tree) and tree[1:] means (cdr tree)}

import string,regsub

#---------------------------------------------------------------------------
# Parsing Lisp

def parseForm(form,separatorRegex=')\|('):
"""parse a form: from a string return a list with '(',')' and atoms"""
formSpaceSplitted=regsub.split(form,'[ \t\n]+') # split space-separated
# split again, based on '(', ')' delimitors. Atoms appear here:
rawParsedForm=[]
for element in formSpaceSplitted:
rawParsedForm=rawParsedForm+regsub.splitx(element,separatorRegex)
parsedForm=filter(lambda x: x!='', rawParsedForm) # Remove spurious '':
return parsedForm

def buildTreeInternal(form,openingParen,closingParen):
"""Build a tree (imbricated lists) from parsed forms
return (tree, unparsed remaining form)"""
if form[0] != openingParen: raise "Form doesn't begin with '('", form
form=form[1:]
result=[]
while form:
c=form[0]
if c==openingParen:
subList,form=buildTreeInternal(form,openingParen,closingParen)
result.append(subList)
elif c==closingParen:
return result, form[1:]
else:
result.append(c)
form=form[1:]
raise "Unbalanced parentheses, ')' missing"

def buildTree(form,openingParen='(',closingParen=')'):
return buildTreeInternal(form,openingParen,closingParen)[0]

#---------------------------------------------------------------------------
# Tree manipulation

TypeAtom=type("")
TypeList=type([])

def reverseTree(tree):
"""Reverse a list and recursively its sublists"""
if tree==[] or type(tree) == TypeAtom: return tree
tree=tree[:] ; tree.reverse() # ((copy and destructively reverse))
result=[]
for i in tree:
result.append(reverseTree(i))
return result

def correctTree(tree):
"""Correct a mathematica-tree: extend every sublist to include the
following element"""
if type(tree) == TypeAtom or tree==[]: return tree
if type(tree[0])==TypeList:
if len(tree)<=1: raise "Assumption error: sublist too small"
tailList=correctTree(tree[1:])
return [ correctTree(tree[0])+[tailList[0]] ] + tailList[1:]
else: return [ tree[0] ] + correctTree(tree[1:])

#---------------------------------------------------------------------------
# tree -> language, string -> language convertion

def treeToScheme(tree):
"""From a tree, return a s-exp representation"""
if tree==[]: return "" # empty s-exp
if type(tree) == TypeAtom: return tree # atom
aRepr=string.joinfields(map(lambda x: treeToScheme(x),tree)," ")
return "("+aRepr+")"

def treeToMathematica(tree):
"""From a tree, return a mathematica(?) representation"""
if tree==[]: return "" # empty s-exp
if type(tree) == TypeAtom: return tree # atom
if type(tree[0]) == TypeList: headRepr=treeToMathematica(tree[0])
else: headRepr=tree[0]
tailRepr=string.joinfields(map(lambda x: treeToMathematica(x),tree[1:])," ")
if tailRepr!='': return headRepr+"["+tailRepr+"]"
else: return headRepr+"[]"

def schemeToTree(form):
return buildTree(parseForm(form))

def mathematicaToTree(form):
form="["+form+"]" # add enclosing [...]
uncorrectedTree=buildTree(parseForm(form,'\]\|\['),'[',']')
correctedTree=reverseTree(correctTree(reverseTree(uncorrectedTree)))
return correctedTree[0] # "remove" enclosing ( ... )

#---------------------------------------------------------------------------

schemeForm='( (f 1 19) f2 (g \t h \t\t (2 \t 5 9)3))'
tree=schemeToTree(schemeForm)

canonicalScheme=treeToScheme(tree)
canonicalSchemeReparsedTree=schemeToTree(canonicalScheme)

mathematicaForm=treeToMathematica(tree)
mathematicaReparsedTree=mathematicaToTree(mathematicaForm)
schemeFromMathematicaReparsing=treeToScheme(mathematicaReparsedTree)

consistent=(schemeFromMathematicaReparsing == canonicalScheme) \
and (canonicalSchemeReparsedTree == tree)
consistentString={0:"no", 1:"yes"}[consistent]

print """scheme=%(schemeForm)s
canonical=%(canonicalScheme)s
as mathematica=%(mathematicaForm)s
back as scheme=%(schemeFromMathematicaReparsing)s
consistency=%(consistentString)s
""" % vars()

#---------------------------------------------------------------------------

# -- Cedric Adjih, not speaking for INRIA, sorry for my English

Randal Schwartz

unread,

Dec 20, 1997, 3:00:00 AM12/20/97

to Xah

>>>>> "Xah" == Xah <x...@best.com> writes:

Xah> Is it feasible to use Perl to parse Lisp expression?

Dunno if it's feasible, but some darn fool (Just kidding, Gisle :-)
already has a fully functioning Lisp Interpreter in Perl submitted to
the CPAN. Says the README:

I just wanted to be able to extract the information that Gnus write to
its ~/.newsrc.eld file. In order to do this I ended up writing a
general Lisp reader. It reads textual lisp and returns a perl
structure that represents the Lisp objects. For instance

; this is a comment
(foo "foo" (+ 42))

Ends up as the following perl structure.

[symbol("foo"), "foo", [symbol("+") 42]]

Once I had this I just had to produce a Lisp printer, i.e. something
that takes structure like the one above and returns the textual lisp
representation of it. And then it was just a matter of a little
programming to turn on evaluation of these objects and a minimal Lisp
environment was born.

Find it at http://www.perl.org/CPAN/authors/id/GAAS/, somewhere near
"perl-lisp".

Just another language polyglot,

--
Name: Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
Keywords: Perl training, UNIX[tm] consulting, video production, skiing, flying
Email: <mer...@stonehenge.com> Snail: (Call) PGP-Key: (finger mer...@ora.com)
Web: <A HREF="http://www.stonehenge.com/merlyn/">My Home Page!</A>
Quote: "I'm telling you, if I could have five lines in my .sig, I would!" -- me

Erik Naggum

unread,

Dec 21, 1997, 3:00:00 AM12/21/97

to

* Tim Bradshaw

| Doing the reverse shouldn't be much harder I think (you might have toi
| hack readtables), so I see no need for perl.

this is not quite as easy as it seems. first, whether a read object is the
first element of a list or is itself, is only defined by the _following_
character, which is at odds with how the Lisp reader works. second, the
end of the input must be defined externally to the syntax for lists: once
we have read a token, we cannot return until the final token is either a
non-list (maybe a semicolon or a period?) or the end of the input (maybe
end of line?). third, while the singleton list is x[], there is no concept
of an empty list in that syntax, unless such it served by some other
special value (such as `nil'). all this means that reading that kind of
list is a quite different process from the normal `read-delimited-list'.

so I threw together this to demonstrate that it is possible to write a
moderately compact parser for a "botched syntax", using the Lisp reader for
the real work:

(defun read-botched-syntax (&optional stream (eof-error-p t) (eof-value nil))
(labels ((read-botched-list ()
(loop initially (read-char stream) ;discard #\[
until (eq (peek-char t stream) #\])
collect (read-botched-internal)
finally (read-char stream))) ;discard #\]
(read-botched-internal ()
(loop with first = (read stream)
for look-ahead = (peek-char t stream nil nil)
while look-ahead
while (eql look-ahead #\[)
do (setq first (cons first (read-botched-list)))
finally (return first))))
(let ((*readtable* (copy-readtable)))
;; make #\[ and #\] terminate tokens. #'identity is never called.
(set-macro-character #\[ #'identity nil)
(set-macro-character #\] #'identity nil)
(if (null (peek-char t stream eof-error-p nil))
eof-value
(read-botched-internal)))))

this will do lots of uninspiring things if much more than simple tokens are
present in the input, so a possible replacement that tries to limit itself
to tokens would go like this:

(read-botched-internal ()
(let* ((char (peek-char t stream))
(macro-function (get-macro-character char)))
;; heuristically determine whether this will be read as a token
;; this works in CMUCL and Allegro CL for Unix, not in CLISP
(if (or (null macro-function) ;this handles #\\
(eq macro-function (get-macro-character #\A)))
(loop with first = (read stream)
for look-ahead = (peek-char t stream nil nil)
while look-ahead
while (eql look-ahead #\[)
do (setq first (cons first (read-botched-list)))
finally (return first))
(error 'reader-error
:stream stream
:format-control "~@<Syntax error in ~S (character ~@C).~:@>"
:format-arguments (list stream char))))))

a more complete approach would be replacing the reader macro function for
all (relevant) characters with one's own token-reader, but that's just too
much work for now.

#\Erik
--
If you think this year is number 97, | Help fight MULE in GNU Emacs 20!
_you_ are not "Year 2000 Compliant". | http://sourcery.naggum.no/emacs/

John Porter

unread,

Dec 24, 1997, 3:00:00 AM12/24/97

to

You asked for it, you got it.
Two routines in Perl, one for lisp->mathematica, and one for
the opposite conversion.
However, I agree that a solution in Scheme would probably be
better.

sub lisp_to_mathematica {
my( $s ) = @_;
# add spaces around parens, because that's what we'll parse on:
$s =~ s/$/ ( /g;
$s =~ s/$/ ) /g;

my( $tree ) = [];
for ( split( " ", $s ) ) {
if ( $_ eq '(' ) {
unshift( @{$tree}, [] );
}
elsif ( $_ eq ')' ) {
my $tt = shift @{$tree};
push( @{$tree->[0]}, $tt );
}
else {
push( @{$tree->[0]}, $_ );
}
}

while ( $#{$tree} == 0 and ref($tree->[0]) ) { $tree = $tree->[0]; }
_to_mathematica( @{$tree} );
}

sub mathematica_to_lisp {
my( $s ) = @_;
# add spaces around brackets, because that's what we'll parse on:
$s =~ s/\[/ [ /g;
$s =~ s/\]/ ] /g;

my( $tree ) = [];
for ( split( " ", $s ) ) {
if ( $_ eq '[' ) {
my $h = pop @{$tree->[0]};
unshift( @{$tree}, [] );
push( @{$tree->[0]}, $h );
}
elsif ( $_ eq ']' ) {
my $tt = shift @{$tree};
push( @{$tree->[0]}, $tt );
}
else {
push( @{$tree->[0]}, $_ );
}
}

while ( $#{$tree} == 0 and ref($tree->[0]) ) { $tree = $tree->[0]; }
_to_lisp( @{$tree} );
}

sub _to_m_node {
my $n = shift;
ref($n) ? _to_mathematica( @{$n} ) : $n;
}

sub _to_mathematica {
my( $head, @tail ) = @_;
_to_m_node($head) . '[' . join(' ', map { _to_m_node($_) } @tail ) .
']';
}

sub _to_lisp {
my( @list ) = @_;
'(' . join( ' ', map { ref($_) ? _to_lisp( @{$_} ) : $_ } @list ) .
')';
}

#
# a simple test to prove the concept:
#
$s = '((f 1 19) f2 (g h (2 5 9) 3))';
print "Original=$s\n";
$mat = lisp_to_mathematica( $s );
print "Mathematica: $mat\n";
$lis = mathematica_to_lisp( $mat );
print "Lisp: $lis\n";

John Porter
jpo...@logicon.com

Xah wrote:
>
> Is it feasible to use Perl to parse Lisp expression?
>

> For simplicity, let's say all atoms are just alphanumerics. Here is a sample
> expression,
>
> ((f 1 19) f2 (g h (2 5 9) 3))
>
> I want to bring out any of such expression's head outside the parenthesis
> and then change the paren to brackets, e.g. (f 1 2 3) becomes f[1 2 3]. The
> above sample becomes
>
> f[1 19][f2 g[h 2[5 9] 3]]
>
> and I also want to do the reverse operation. I'm wondering if it's feasible,
> say, by a Perl guru in a week? If not, is there in general a tool which I
> can do this?
>

> I don't know any parser or compiler theory. Is it possible for me to learn
> something like lex/yacc to be able to do what I want? All I need is to
> transform the *string*. For the curious, I am doing this to explore tree
> forms between Scheme and Mathematica.
>