NLP, Shen, Shen Prolog, example with semantics

67 views

Skip to first unread message

Antti Ylikoski

unread,

Jul 9, 2016, 2:25:51 PM7/9/16

to Shen

DCGs, Definite Clause Grammars, are familiar to all who have studied
NLP. One of the original motivations to develop Prolog was to use it
for handling DCGs, by Alain Colmerauer's group IIRC in Marseille.

Quite nontrivial things can be done with DCGs. In this entry, I
describe a relatively sophisticated but pretty well-known exercise,
not only to parse natural language, but also to attach the semantics
to the parsed language.

The said semantics of the language is in the form of First Order
Predicate Logic. This demo program is here because translating it
into Shen Prolog would be rather easy, with the work hours available.
I have done some work for that end.

The file dcg_semant.pl contains the Prolog program.
Characteristically of Prolog, a very nontrivial thing can be done
succinctly.

The file dcg_semant.txt is the sample run, with both positive and
negative examples.

yours, Dr A. J. Y.
HELSINKI
Finland, the E.U.

dcg_semant.pl

dcg_semant.txt

Mark Tarver

unread,

Jul 10, 2016, 4:48:23 AM7/10/16

to Shen

You might find this interesting - from 2009 using Qi. Not looked at it for years so usual disclaimers apply.

Mark

Parsing as Algebra

This study connects two areas not generally associated in CS - parsing and algebra. I will show that parsing in context free grammars can be viewed as a species of algebra concerned with simple laws of addition.

So where to begin? Parsing or algebra? Lets begin with algebra, and I emphasise that the algebra we are looking at is very simple and relates to basic properties of addition and equality. The first rule needs no justification.

1. For all x, x = x.

The second is almost as simple; it enables us to cancel common terms on each side of an equation.

2. The equality x + y = x + z may be proved if y = z can be proved.

The third pair reflects the associativity of addition.

3a. (x + y) + z = w may be proved if x + (y + z) = w can be proved
3b. w = (x + y) + z may be proved if w = x + (y + z) can be proved

The fourth rule allows us to substitute equals for equals.

4. Given x = y, then x + z = w can be proved if y + z = w can be proved.

The final rule allows us to add zero to each side of an equality.

5. x = y can be proved if x + 0 = y + 0 can be proved.

and that's it.

Now what is the significance of these rules? Actually the significance is that when suitably animated (my jargon, meaning, applied in a particular control order using the computer), they form a model for a top-down recursive descent parser. One oddity about this, is that it is not obvious at first sight, but becomes obvious after an example.

Doing a Proof (or is it Parsing?)

OK, how does this work? The parsing problem has to be placed in a canonical form of the following kind.

1. Every grammar rule used in the parse of the form x --> y z has to be expressed as an equality x = (y + z). These equalities are all assumptions in the proof.

2. Amongst these equalities there is one of the form s = x, where s is the distinguished symbol meaning sentence.

3. If S is the sentence to be parsed then the equality to be proved is s = S.

Example: Here are a bunch of grammar rules used as assumptions

1. s = (np + vp)
2. np = (det + n)
3. det = "the"
4. n = "girl"
5. n = "boy"
6. vp = vintrans
7. vp = (vtrans + np)
8. vintrans = "jumps"
9. vtrans = "likes"
10. vtrans = "loves"

We want to prove "the boy jumps" is grammatical which means that we want to prove the equality s = ("the" + ("boy" + "jumps")) using these assumptions and the algebraic laws 1-5.

Theorem: s = ("the" + ("boy" + "jumps"))

Proof: by backwards reasoning.

	rule used
s = ("the" + ("boy" + "jumps"))
(s + 0) = (("the" + ("boy" + "jumps")) + 0)	5
((np + vp) + 0) = (("the" + ("boy" + "jumps")) + 0)	4 (using assumption 1)
(np + (vp + 0)) = (("the" + ("boy" + "jumps")) + 0)	3a
((det + n) + (vp + 0)) = (("the" + ("boy" + "jumps")) + 0)	4 (using assumption 2)
(det + ((n + vp) + 0)) = (("the" + ("boy" + "jumps")) + 0)	3a
("the" + ((n + vp) + 0)) = (("the" + ("boy" + "jumps")) + 0)	4 (using assumption 3)
("the" + ((n + vp) + 0)) = ("the" + (("boy" + "jumps") + 0))	3b
((n + vp) + 0) = (("boy" + "jumps") + 0)	2
((n + vp) + 0) = ("boy" + ("jumps" + 0))	3b
(n + (vp + 0)) = ("boy" + ("jumps" + 0))	3a
("boy" + (vp + 0)) = ("boy" + ("jumps" + 0))	4 (using assumption 5)
(vp + 0) = ("jumps" + 0)	2
(vintrans + 0) = ("jumps" + 0)	4 (using assumption 6)
("jumps" + 0) = ("jumps" + 0)*	4 (using assumption 8)
solved	1

* This step shows why we need rule 5 at the beginning, without the zero the step is not formally correct

A Proof Procedure

There is a proof procedure for solving these problems which corresponds to top down recursive descent parsing.

1. Apply rule 5 at the beginning only.
2. Apply rules 1, 2 and 3a and 3b everywhere possible. If rule 1 is successfully applied the proof ends successfully. S is grammatical.
3. Rule 4 is non-deterministic; so if the proof is blocked, then return to the last application of this rule and look for another legal application of it. If no new application of rule 4 can be found anywhere, the proof fails. S is not grammatical.

Doing It in Prolog

Generally I'm a fan of functional programming, but this procedure lends itself to a Prolog formulation that is a little gem of 8 lines.

\********************************************************************************\

\A 8 line algebra ATP in Qi Prolog (or is it really a recursive descent parser?). Users of regular Prolog need to put in those tedious commas into the lists.\

(m-prolog "

parse([s = S], A) :- parsing([[s + 0] = [S + 0] ], A).

parsing([X = X],_).
parsing([[X + Y] = [X + Z]], A) :- !, parsing([Y = Z],A).
parsing([[[X + Y] + Z] = W], A) :- !, parsing([[X + [Y + Z]] = W], A).
parsing([W = [[X + Y] + Z]], A) :- !, parsing([W = [X + [Y + Z]]], A).

parsing([[X + Y] = Z], A) :- member([X = W],A), parsing([[W + Y] = Z], A).

member(X,[X | _]).
member(X,[_ | Y]) :- member(X,Y).")

\***********************************************************************************\

And here is an example of it working.

(8-) (set *grammar* [
[s = [np + vp]]
[np = [det + n]]
[det = "the"]
[n = "girl"]
[n = "boy"]
[vp = vintrans]
[vp = [vtrans + np]]
[vintrans = "jumps"]
[vtrans = "likes"]
[vtrans = "loves"]])
[[s = [np + vp]] [np = [det + n]] [det = "the"] [n = "girl"] [n = "boy"] [vp = vintrans] [vp = [vtrans + np]]
[vintrans = "jumps"] [vtrans = "likes"] [vtrans = "loves"]]

(9-) (prolog? (parse [s = ["the" + ["boy" + "jumps"]]] (value *grammar*)))
true

Conclusions

There is something rather satisfying about seeing the unity of two apparently different processes, and I found the above rather pleasing, like a strange pebble picked up from the beach that when polished shows up a beautiful colour. However non-utilitarian the insight, it has always given me a strange pleasure to see how the mental gymnastics I learnt as a boy can find a purchase in formal language recognition. Collecting these pebbles can become a passion.

Mark
dr.mt...@ukonline.co.uk

Reply all

Reply to author

Forward

0 new messages