Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsing K in K (long, final version)

36 views
Skip to first unread message

Steve Apter

unread,
Oct 16, 1996, 3:00:00 AM10/16/96
to

I received quite a few comments on this paper, and a number of corrections.

This is the last version I'll publish here, although if anyone is interested
they can email me for the wisecracks I've removed.

Thanks for everyone's patience.

--

There is a paper by Bunda and Gerth in one of the old APL proceedings
which describes the idea of a pairwise binding grammar (PBG). In that
paper, the authors present grammars for the main dialects of APL and
demonstrate how small variations in binding strength between a fixed
set of types produce large differences in syntax.

In work carried out a few years later, Dave Landetta used a PBG to
develop a consistent extension of APL2 to arrays of functions. None
of the APL implementors picked up on Dave's ideas, which has always
puzzled me.

A brief discussion with Phil Benkard (the inventor of these grammars)
at last week's Minnowbrook Array Languages conference stimulated me to
think about implementing a PBG in K for pure K.

Now, the Bunda-Gerth grammars had to deal with such anomalies as bracket
indexing and the axis operator, which are blessedly absent from pure K.
Also, the operators of K are uniformly monadic; adverbs yes, conjunctions
no. Consequently, the analytical problem of defining a PBG for pure K
is relatively simple. But the programming problem is made no easier by
this fact. An APL2 effort, which I undertook in 1986 in order to study
Landetta's work, required more than a few lines of code, notwithstanding
the availability of n-wise reduction, precisely the iteration pattern
required by this problem. So I was curious to see how K might simplify
the programming of a general PBG.

To assist readability, I've adopted Roger Hui's convention of using a
single space in front of variable names. Thus, "apply foo to arg".


Pairwise binding grammars
-------------------------

A PBG consists of a set of syntax types and a pair of tables B and C.
The syntax types jointly and exhaustively classify the possible tokens.
For example, in K the type of the token + is v: + is a verb. B[x;y]
is the strength of the binding between entities of types x and y. C[x;y]
is the type of the result of binding entities of types x and y.

The PBG algorithm operates on a type vector. Given a list of tokens, the
type vector for that list is produced by replacing each token in the list
with its type. Given a type vector as input, algorithm produces a type as
its output if the type vector is well-formed, and signals a syntax error
if it is not.

The PBG algorithm itself is very simple. At each step, we reduce by
one the length of the type vector by replacing a pair of types in the
vector by a single type. Intuitively, we choose this pair by scanning
the type vector from the right, computing the binding strength of each
pair, and checking whether that number is less than the one previously
computed. When the binding strength drops, we replace the pair of types
immediately to the right with the type of the singleton which results
from binding that pair. Here is a schematic example:

First, compute the binding strength vector from the type vector:

a b c d e f g h i / schematic type vector
j k l m n o p q / schematic binding strength vector

Then, for each pair in the binding strength vector, see whether the left
element is less than the right element:

j<k k<l l<m m<n n<o o<p p<q
1 0 1 0 1 0 0

Observe that p<q asks whether g h binds less tightly than h i; the answer
is No. o<p asks whether f g binds less tightly than g h; the answer is
still No. n<o asks whether e f binds less tightly than f g; the answer
is Yes, and so we bind f to g.

By writing these results directly under the element whose left and right
binding strengths are being compared, we see immediately how to reduce
the length of the type vector:

a b c d e f g h i
j k l m n o p q
1 0 1 0 1 0 0
^
a b c d e(f g)h i

As an example, consider the expression

a + - b * c

which has the type vector

n v v n v n

Suppose that noun-verb binding is 2, verb-noun binding is 1, and the
binding strength of all other pairings is 0. So the pairwise binding
strengths of the type vector are:

2 0 1 2 1

and the relative binding strengths are:

0 1 1 0

Stacked thus:

a + - b * c / tokens
n v v n v n / types
2 0 1 2 1 / binding strengths
0 1 1 0 / pairwise <
^ / drop in binding strength
n v v(n v)n / bound pair


The grammar of pure K in K
--------------------------

Pure K is K with all syntactic sugar precipitated out into three under-
lying primitives: access/amend, spelled with ".", and enlist and join,
the monadic and dyadic primitives located on ",".

assignment a:b -> .(`a;();:;b)

indexed value a[b;...;b] -> .(`a;(b;...;b))

indexed assignment a[b;...;b]:c -> .(`a;(b;...;b);:;c)

function application a[b;...;b] -> .(`a;(b;...b))

list-notation (a;...;a) -> (,a),...,,a

Thus purified, K contains three syntactic types:

n nouns names, literals, &c.
v 20 verbs ~!@#$%^&*-_+=|:,<.>?
a 3 adverbs /\'

The binding strengths are:

n v a
- - -
n 1 3 4

v 2 1 4

a 0 0 0

Noun-adverb and verb-adverb > noun-verb > verb-noun > noun-noun and
verb-verb > adverb-anything.

The result types are:

n v a
- - -
n n v v

v n v v

a

Noun-adverb and verb-adverb are verbs; noun-verb and verb-noun are verbs;
noun-noun is a noun; verb-verb is a verb; adverb-anything and anything-
adverb are nothing.

Binding table first:

B[`n`v`a;`n`v`a]:(1 3 4;2 1 4;0 0 0)

The binding table B is a two-dimensional dictionary whose axes are the
syntactic types of pure K. To find the binding strength for verb-adverb,
index B by the symbols of those types, using either access:

B .`v`a
4

or high-calorie bracket indexing:

B[`v;`a]
4

Observe that the leading axis of B is the left type, and the second axis
the right.

Next, the type table:

C[`n`v`a;`n`v`a]:(`n`v`v;`n`v`v;```)

The type table C is a two-dimensional dictionary with the same structure
as B. To find the type of the result of binding an adverb to a verb on
its left:

C .`v`a
`v

Now let's type the tokens of an expression in pure K:

+/'a*-b+c / sum each of a times negate b plus c
vaanvvnvn

We'll represent the types as a vector of symbols, reversed relative to
the token list:

q:|`v`a`a`n`v`v`n`v`n
q
`n`v`n`v`v`n`a`a`v

The type vector is reversed because, while we want to attack the expres-
sion from the right, it is easier to program the scan from the left.

Next, we need a function which, given a type vector, returns the index of
the pair to bind.

pair:{(<':{B[x;y]}':x)?1}

pair q
1

That is, the first pair to bind is q 1 2, which correspond to b + in the
token list.

[ How does pair work? Let x be the reversed type vector:

x:`n`v`n`v`v`n`a`a`v

{B[x;y]}':x / index B pairwise by x
2 3 2 1 3 0 0 4

<':{B[x;y]}':x / pairwise less-than
0 1 1 0 1 0 0

(<':{B[x;y]}':x)?1 / first 1 = drop in strength
1
]

The function bind uses the result of pair to replace the bound pair with
the type of the binding:

bind:{,/.[(0,i+0 2)_ x;1;:;C .|x 0 1+i:pair x]}

bind q
`n `v `v `v `n `a `a `v

Observe that `n`v`n... has been correctly reduced to `n`v... .

[ How does bind work? We've located the point in the type vector where
binding strength drops. We want to replace a pair with a singleton,
the type of the result of binding that pair. We aim to partition the
type vector into three subvectors: the first subvector is everything
up to the pair to bind; the second subvector is the pair to bind; and
the third subvector is the rest. Once we have this partition, we can
replace the middle subvector with the result-type, which is C indexed
by the reverse of the types to bind. The main subexpression of bind
is an indexed amend of the partitioned vector:

.[indices _ data;1;:;new]

indices _ data cuts the data into partitions; 1 is the index into
those partitions (remember: origin 0!); : is assignment, or replace;
and new is the data to assign. The partition is then razed using
,/ ]

The reduction should continue until a single symbol remains, which then
represents the type of the result of evaluating the expression:

type:{*{1<#x}bind/x}

type q
`n

[ How does type work? K's extended reduce is called "over". Let x
be any data, f be any monadic function, and g be any monadic func-
tion returning 0 or 1. Then

g f/x

applies f to x until g x is not 1. The result of f x is fed back
to f as its argument until g x is false. Hence, bind x until the
length of the result is 1. The initial * says: return the first
element of the result. ]

To see the intermediate steps, scan bind over the type vector until
the result has length 1:

{1<#x}bind\q
(`n `v `n `v `v `n `a `a `v
`n `v `v `v `n `a `a `v
`n `v `v `n `a `a `v
`n `v `n `a `a `v
`n `v `a `a `v
`n `a `a `v
`n `a `v
`n `v
,`n)


Pure K + parentheses
--------------------

In pure K, parentheses turn verbs into nouns. For example:

3#(+)
(+;+;+)

The result, a list, contains nouns as parts. The expression which pro-
duces this list is correctly parsed as follows:

3#(+)
-----
nv(v) / type each token
nvn / (v) -> n
vn / nv -> v
n / vn -> n

To type an expression containing parentheses, recursively replace each
parenthesized token list with a one element list of the type vector of
that list. For example:

(+ / ' a * - b + c) % + / a * b + c
(,`v`a`a`n`v`v`n`v`n),`v`v`a`n`v`n`v`n

We won't bother to reverse the type vectors, leaving that job to the
parsing function.

The algorithm to parse a general type list is: parse each sublist;
if it completes, type it as a noun; parse the resulting type vector.

The K functions which implement this algorithm are:

list:{type@|@[x;&~@:'x;noun@|:]}
noun:{:[4:x;type x;(_f@|:)'x];`n}

Two examples:

Parse (+/'a*-b+c)%+/a*b+c:

r:(,`v`a`a`n`v`v`n`v`n),`v`v`a`n`v`n`v`n
list r
`n

Parse 3#(+):

s:`n`v,,,`v

s
(,`v
`v
`n)

list v
`n

[ How does list work? The function applies type to the reverse of an
amend of its argument:

type@| ...

The amend expression,

@[x;&~@:'x;noun@|:]

says: amend each non-atomic part of x with the noun-reverse of that
part. For example, if x is r above:

(`n
`v
`n
`v
`n
`a
`v
`v
`n `v `n `v `v `n `a `a `v)

then x 8, corresponding to the parenthesized subexpression, is the
only non-atomic item. To that item we apply noun-reverse. noun
applies self-reverse recursively until it reaches a non-list; that
argument is parsed. But no matter what the result, noun returns
`n. That is because parentheses convert all types to noun. ]


The complete set of K functions required to implement a PBG is:

B[`n`v`a;`n`v`a]:(1 3 4;2 1 4;0 0 0)
C[`n`v`a;`n`v`a]:(`n`v`v;`n`v`v;```)

type:{*{1<#x}bind/x}
bind:{,/.[(0,i+0 2)_ x;1;:;C .|x 0 1+i:pair x]}
pair:{(<':{B[x;y]}':x)?1}

list:{type@|@[x;&~@:'x;noun@|:]}
noun:{:[4:x;type x;(_f@|:)'x];`n}


0 new messages