typed Shen-YACC and Montague Grammar

86 views
Skip to first unread message

Mark Tarver

unread,
Jan 30, 2021, 6:42:02 AM1/30/21
to Shen
I haven't the time to explain Shen-YACC with types but SP users might like to experiment
with the following example of a Montague grammar.  It will teach a lot.

Richard Montague developed this approach which compiles a source language (English) into an object language (logic) .  There are basically two types - t (term) and f (formula).  Montague's logic was complex and for this fragment of English we use a subset of Montague logic - function-free first-order logic without equality.    The logical constants are ~ v & => <=> e! a!,  the last two are the quantifiers.

Here is the type theory

(datatype t

   if (not (element? t [~ v & => <=> e! a!]))
   T : symbol;
   ________
   T : t;
   _______________
   (gensym v) : t;)

(datatype f

  F : t; T : (list t);
  ____________________
  [F | T] : f;

 (not (= F ~)) : verified;
  F : t, T : t >> P;
 ___________________
 [F T] : f >> P;

 (not (element? C [v & => <=>])) : verified;
 (not (element? F [e! a!])) : verified;
  F : t, T1 : t, T2 : t >> P;
 ____________________________
 [F T1 T2] : f >> P;

 P : f;
 ==========
 [~ P] : f;

  if (element? C [v & => <=>])
  P : f; Q : f;
  =============
  [P C Q] : f;

  X : t; P : f;
  =============
  [e! X P] : f;

  X : t; P : f;
  =============
  [a! X P] : f;)

Here is a Montague grammar in typed Shen-YACC.

(defcc <sent>
  {(list t) ==> f} 
  <np> <vp> := (<np> <vp>);)
  
(defcc <np>
  {(list t) ==> ((t --> f) --> f)}
  Name := (/. P (P Name))   where (name? Name);
  <det> <rcn> := (<det> <rcn>); 
  <det> <cn> := (<det> <cn>);)
  
(define name?
  {t --> boolean}
   Name -> (variable? Name)) 
  
(defcc <cn>
   {(list t) ==> (t --> f)}
   CN := (/. X [CN X])     where (common-noun? CN);)

(define common-noun?
   {t --> boolean}
    CN -> (element? CN [girl boy dog cat]))

(defcc <rcn>
   {(list t) ==> (t --> f)}
   <cn> that <vp> := (/. X [(<cn> X) & (<vp> X)]);
   <cn> that <np> <trans> := (/. X [(<cn> X) & (<np> (/. Y (<trans> Y X)))]);)

(defcc <vp>
   {(list t) ==> (t --> f)}
   <intrans>;
   <trans> <np> := (/. X (<np> (/. Y (<trans> X Y))));)

(defcc <intrans>
  {(list t) ==> (t --> f)}
    Intrans := (/. X [Intrans X])   where (intrans? Intrans);)

(define intrans?
   {t --> boolean}
    Intrans -> (element? Intrans [runs jumps walks]))

(defcc <trans>
   {(list t) ==> (t --> t --> f)}
    Trans := (/. X Y [Trans X Y])   where (trans? Trans);)

(define trans?
   {t --> boolean}
   Trans -> (element? Trans [likes greets admires]))
   
(defcc <det>
  {(list t) ==> ((t --> f) --> ((t --> f) --> f))}
   some := (let V (type (gensym v) t) (/. P Q [e! V [(P V) & (Q V)]]));
   every := (let V (type (gensym v) t) (/. P Q [a! V [(P V) => (Q V)]]));
   no      := (let V (type (gensym v) t) (/. P Q [a! V [(P V) => [~ (Q V)]]]));) 

Here it is in SP.

connecting to the web ...

Shen, copyright (C) 2010-2020 Mark Tarver
www.shenlanguage.org, Shen Professional Edition 28
running under Common Lisp, implementation: SBCL
port 3 ported by Mark Tarver
commercially licensed to Mark Tarver

(0+) t#type : symbol
f#type : symbol
(fn <sent>) : ((list t) ==> f)
(fn <np>) : ((list t) ==> ((t --> f) --> f))
(fn name?) : (t --> boolean)
(fn <cn>) : ((list t) ==> (t --> f))
(fn common-noun?) : (t --> boolean)
(fn <rcn>) : ((list t) ==> (t --> f))
(fn <vp>) : ((list t) ==> (t --> f))
(fn <intrans>) : ((list t) ==> (t --> f))
(fn intrans?) : (t --> boolean)
(fn <trans>) : ((list t) ==> (t --> (t --> f)))
(fn trans?) : (t --> boolean)
(fn <det>) : ((list t) ==> ((t --> f) --> ((t --> f) --> f)))

run time: 0.5459995269775391 secs

typechecked in 10766 inferences
loaded

(0+) (compile (fn <sent>) [no girl likes a dog])  
parse failure

(1+) (compile (fn <sent>) [no girl likes some dog])
[a! v12352 [[girl v12352] => [~ [e! v12354 [[dog v12354] & [likes v12352 v12354]
]]]]] : f

(2+) (compile (fn <sent>) [Mary likes John])
[likes Mary John] : f

(3+) (compile (fn <sent>) [every girl that jumps likes Bill])
[a! v12359 [[[girl v12359] & [jumps v12359]] => [likes v12359 Bill]]] : f

Mark


bruno arias

unread,
May 12, 2021, 1:31:56 PM5/12/21
to Shen
I was wondering does Shen support context-sensitive grammars?
On the above examples with Shen--Yacc it give the impression, since it looks like there are several non-terminal on the left-hand-side.

  <np> <vp> := ...

And even with a mix of terminals and non-terminals:

  <cn> that <vp> := ...

However reading the docs its clear that everything after := is part of the semantics and not the syntax,
and everything before := is actually what would have normally been on the LHS.
In other words the non-terminal being defined is the parameter being pass to defcc, eg: 

  (defcc <sent>

With that said, I tried experimenting with defmacro

  (1-) (defmacro infix [X + Y] -> [+ X Y])
  infix
  (2-) (+ 4 2)
  6
  (3-) (4 + 2)
  6

The problem is that AFAIK lisp macro expect s-expressions as inputs, ie this works so-so:

  \\ trying to define something more typical of function calls in other languages
  
  (0-) (defmacro myfunction ["explode(" X ")"] -> [explode X])
  myfunction
  (1-) ("explode(" "hello wolrd" ")")
  ["h" "e" "l" "l" "o" " " "w" "o" "l" "r" "d"]

But this less so:

  \\ now we have support for any function, but we have a space in-betweeen,
  \\ and the function also can only take one input
  (5-) (defmacro foo [F "(" X ")"] -> [F X])
  foo
  (6-) (explode "(" "hello world" ")")
   ["h" "e" "l" "l" "o" " " "w" "o" "l" "r" "d"]

Any advice on writing how to make this toy example better?
Also, any advice for writing  context-sensitive grammars in general?

Mark Tarver

unread,
May 12, 2021, 2:50:16 PM5/12/21
to Shen


On Wed, May 12, 2021 at 6:31 PM bruno arias <brunoa...@gmail.com> wrote:

I was wondering does Shen support context-sensitive grammars?
On the above examples with Shen--Yacc it give the impression, since it looks like there are several non-terminal on the left-hand-side.

  <np> <vp> := ...

Yes and no.   When you write 

(defcc <sent>
     <np> <vp> := ....)

you are asserting that <sent> can be expanded to <np> <vp> so what you have is in form a context-free grammar rule <sent> -> <np> <vp>.  What follows the := is the semantic action which tells the compiler what to do with the input.

However Shen YACC allows you the use of  guards which allow you to express languages which are CS; one such is the set of all sequences anbncn  for n >=1 which requires a CS grammar.  But this can be expressed in Shen YACC as:

(defcc <anbncn>
  <an> <bn> <cn> := (append <an> <bn> <cn>) where (let Lan (length <an>)
                                                                                                 Lbn (length <bn>)
                                                                                                 Lcn (length <cn>)
                                                                                                 (and (= Lan Lbn) (= Lbn Lcn)));)

 
(defcc <an>
  a <an>; a;)

(defcc <bn>
  b <bn>; b;)
 
(defcc <cn>
  c <cn>; c;)


Here is a test run in SP.

(10-) (compile (fn <anbncn>) [a a a b b b c c c])
[a a a b b b c c c]


(11-) (compile (fn <anbncn>) [a a a b b b c c])
parse failure

So yes, you can define CS languages by using an augmented CF notation.  You just don't use CS grammars to do it.

And even with a mix of terminals and non-terminals:

  <cn> that <vp> := ...

However reading the docs its clear that everything after := is part of the semantics and not the syntax,
and everything before := is actually what would have normally been on the LHS.
In other words the non-terminal being defined is the parameter being pass to defcc, eg: 

  (defcc <sent>

 Correct .

With that said, I tried experimenting with defmacro

  (1-) (defmacro infix [X + Y] -> [+ X Y])
  infix
  (2-) (+ 4 2)
  6
  (3-) (4 + 2)
  6

The problem is that AFAIK lisp macro expect s-expressions as inputs, ie this works so-so:

Yes; Shen is a Lisp, albeit a very sophisticated one, so it traffics in s-exprs and prefix notation.   This has well appreciated
advantages over writing in mixfix style and makes writing compilers much easier. 
 

  \\ trying to define something more typical of function calls in other languages
  
  (0-) (defmacro myfunction ["explode(" X ")"] -> [explode X])
  myfunction
  (1-) ("explode(" "hello wolrd" ")")
  ["h" "e" "l" "l" "o" " " "w" "o" "l" "r" "d"]

Right, the problem here is that a macro replaces one s-expr by another and you want to parse the input in a different way - more like ML.
To define an ML syntax you need either

1.  Use Shen YACC to define your own reader and use it to define your own loader.
2.  Rather less heavy; define a construction that says that what is embedded therein is in ML syntax.

Like this:

(ml  fun factorial n =
        if n = 0 then 1 else n * factorial (n - 1))

Then you have a macro saying what to do with your ML declaration

(defmacro mlmacro
   [ml | MLCode] -> (ml->shen MLCode))

Note that this lightweight approach can fall foul of reserved symbols like | which is used in ML for entirely different purposes 
to Shen.  One solution is to place the ML code in a string.   

But if you just want to write f(1,2) instead of (f 1 2) you can do this using the method above.

(defmacro brunosmacro
   [bruno | BrunoCode] -> (rewrite-notation BrunoCode))

Mark

bruno arias

unread,
May 12, 2021, 3:49:21 PM5/12/21
to qil...@googlegroups.com
ah ok, thank you for the wonderful explanation.
The second option seems lightweight enough for this toy example.

(ml  fun factorial n =
        if n = 0 then 1 else n * factorial (n - 1))

The idea of embedding different syntaxes (EDSLs) in a lisp seems like a lot of fun,
although I'm sure in practice it would often not be worth the trade-off.


--
You received this message because you are subscribed to a topic in the Google Groups "Shen" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qilang/VUQERt-FOvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qilang+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/qilang/CAGp%3DqtQfd0dNk4dGaPcXeCiTy7YqHBpti_LX8cMiqEL6rapS0g%40mail.gmail.com.

Mark Tarver

unread,
May 12, 2021, 4:36:57 PM5/12/21
to qil...@googlegroups.com
This will get you going - it is a beginning.

(0-) (defmacro bruno
   [bruno | BrunoCode] -> [package null []  | (parse-bruno BrunoCode)])
bruno

(1-) (define parse-bruno
        [F [X | Y] | Z] -> [(parse-bruno [F X | Y]) | (parse-bruno Z)]
        X -> X)
(fn parse-bruno)

(3-) (bruno *(1 2) cons(1 2) set(a 4))
2
[1 | 2]
4


Mark

You received this message because you are subscribed to the Google Groups "Shen" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qilang+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/qilang/CALF%2B18nDB25bq%2B8iFMnQM0zPiWY1nWXkszdvm3GAis8pDoUf0A%40mail.gmail.com.

bruno arias

unread,
May 12, 2021, 5:39:53 PM5/12/21
to qil...@googlegroups.com
Ok this is amazing, although I feel sorry that I’m not grasping how it is that for example:

 *(1 2)

works. I would assume that Shen’s tokenizer world treat that as two atoms/symbols “*(1” and “2)”, but it clearly isn’t. 

This is what I’m understanding so far:

 (bruno *(1 2) cons(1 2) set(a 4))

In the above line, the part below is being passed to the macro bruno, which in turn passes it to the function parse-bruno

*(1 2) cons(1 2) set(a 4)

Ooh oops, never-mind I got it. Although, I’ll write it out in case any other newbie is following along. 

So as the above line is passed along to parse-bruno, its saying:
  if you find something of the format “function (list of elements)”, 
  then rewrite it as (function list of elements)
That corresponds to the part [F [X | Y]] -> [F X | Y]
The part with “... | Z] -> (parse-bruno Z)” is essentially saying parse the rest of the above line. So we are recursively rewriting each of the functions in the above line. 

[F [X | Y] | Z] -> [(parse-bruno [F X | Y]) | (parse-bruno Z)]

What struck me is that I would’ve expected something like “* (1 2) cons (1 2) set (a 4)” to work, but not the above to work. Probably, since I thought the tokenizer was using spaces as a delimiter. I suppose it does recognize “set(“ as two different tokens, a function and the beginning of a list. 

Once again thank you! Really interesting to see how Shen can handle these examples so concisely. I would never consider doing something like this inside another language; a parser generator or a compiler compiler are tools outside of a programming language, and parser combinators are usually a library used to parse a string inside of a language. 

I’m curious: the [X|Y] -> [Y|X] syntax (which I’m guessing is inspired by prolog), is that also created using macros?

Mark Tarver

unread,
May 13, 2021, 8:03:23 AM5/13/21
to qil...@googlegroups.com
[X | Y] goes to (cons X Y) but this is done by the Shen reader.  A lot of the Shen kernel is engineered through macros though, and without them it would be v. difficult to define Shen within a language as simple as Kl.

Macros effectively operate one-one, replacing one s-expr by another.  To achieve many-one, you have to wrap the many in some kind of container - hence the bruno keyword.  To achieve one-many, again you have to wrap the many into one.   Here we do this by placing the many in a package.  Packages are generally used to insulate code from name clashes and contents are renamed, but here we use the null package which parcels up the contents and does not rename them.

These devices allow you to splice DSLs into Shen code subject to the Shen reader reserving certain keywords for its own use.

1.  Characters indicating comments.
2.  [ , ] and | for lists.
3.  e numbers.

If you wanted a DSL which could be spliced into Shen code and which used reserved characters , then you would be best to embed the DSL code in a string.

(dsl "<domain specific code goes here>")

Then no macro would be needed.  Even neater would be to parameterise the dsl function so as to allow different DSLs.

  (dsl ml "<ML code goes here>")  

(define dsl
  ml Code -> (compile-ml 
                       (read-from-string-unprocessed
                         (read-reserved-chars Code))))
 
Use read-from-string in OS Shen.

(define read-reserved-chars
  "" -> ""
  (@s "|" Code) -> (@s " alt! " (read-reserved-chars Code))
  (@s "," Code) -> (@s " " (read-reserved-chars Code))
  (@s F "(" Code) -> (@s "(" F (read-reserved-chars Code))
  (@s S Code) -> (@s S (read-reserved-chars Code)))


(define compile-ML
   ML -> ML)

Give it a try 

(dsl ml "fun factorial 0 = 1
                | factorial n = n * factorial (n - 1)")
[fun factorial 0 = 1 alt! factorial n = n * factorial [n - 1]]

Now you would obviously put some meat into compile-ML to make it produce executable code in list form.

Once you're happy that what you are outputting is legal Shen you can get the reader to evaluate it by using a macro.

(defmacro dslmacro   
   [dsl Language Code] -> [package null [] | (dsl Language Code)])

Now if we type

(dsl ml "fun factorial 0 = 1
   | factorial n = n * factorial (n - 1)")

it will compile and evaluater.  But we get an error because compile-ML is not doing anything.

Shen is a very powerful metalanguage for designing DSLs and experimental languages and one of the reasons it will be around for a long time.    There are some long-range plans wrt this fact which will unfold in 2022.

Mark




Reply all
Reply to author
Forward
0 new messages