Data constructors (judgements)

35 views
Skip to first unread message

Belal

unread,
Mar 4, 2021, 8:00:56 PMMar 4
to Grammatical Framework
Hello,

I have a question about data constructors: when is it necessary to use them as opposed to defined functions (data vs. fun and def)? What practical applications warrant the exclusive use of data constructor judgements?

I've only seen them used in the RGL in the Numeral.gf and NumeralCgg.gf abstract files. I also saw a comment about data vs. defined function judgements in the RGL source code for Ancient Greek (in TransferGrcAbs.gf) but I couldn't understand the rationale:

-- The transformation of a (PossPron pron):Quant to an adjective or Adv is impossible,
-- since it would need data DetCN rather than fun DetCN !
--
-- data DetCN : Cat.Det -> Cat.CN -> Cat.NP ;
--
-- fun possAdj : NP -> NP ;
-- def possAdj (DetCN (DetQuant (PossPron pers) num) cn) =
-- (DetCN (DetQuant DefArt num) (AdvCN cn (PrepNP possess_Prep (UsePron pers)))) ;

-- Likewise, PartVP is not a data constructor!
-- fun partAP : AP -> AP ;
-- def partAP (PartVP vp) = PartPresVP PPos vp ; 

I've read the relevant sections in data constructor judgements in the online GF Language Reference Manual (https://www.grammaticalframework.org/doc/gf-refman.html#data-constructor-definitions-data), but I am still unsure on how to utilize data judgements. In particular, what do the terms t1, t2, ...tm  mean in the data judgement syntax?

data f : A1 -> ... -> An -> C t1 ... tm

===

fun f : A1 -> ... -> An -> C t1 ... tm ; data C = f 

While I understand that both types of judgements (data vs. fun + def) play a role in constructing abstract syntax trees (sub-trees), some clarification on the practical differences between the two would be helpful. Thank you.

Best regards,
Belal

Inari Listenmaa

unread,
Mar 4, 2021, 8:24:33 PMMar 4
to Grammatical Framework
Hi Belal,

Good question, and not at all well documented!

Def is a way to translate trees to other trees. Here's an example of a GF file that uses def:


--from NumeralTransfer.gf
fun dn10 : Dig -> Sub10 ;
def dn10 D_1 = pot01 ;
    dn10 d1  = pot0 (dn d1) ;

fun dn : Dig -> Digit ;
def dn D_2 = n2 ;
    …
    dn D_9 = n9 ;

-- from Numeral.gf:
data
  pot01 : Sub10 ;          -- 1
  pot0 : Digit -> Sub10 ;  -- d * 1

The whole point of NumeralTransfer is to translate digits, like "100", into RGL numerals, like "hundred". In order to do the translation with def, the relevant functions need to be defined as data---that's why, in Numeral.gf, pot01 and pot0 are defined as data, not fun. However, the transfer functions, dn and dn10, are defined as fun, and then given a def.

This pull request can also be useful to read. https://github.com/GrammaticalFramework/gf-rgl/pull/8

However, the whole def syntax is not well supported anymore. You can only do transfer on command line from GF shell, and even that doesn't work anymore as it used to, see this discussion. 
So if you want to do tree transformations, it's better to compile your grammar to PGF, and use an external programming language to manipulate the trees as you like. You can read this blog post https://inariksit.github.io/gf/2019/12/12/embedding-grammars.html for help, it contains installation instructions of the libraries and an example, for both Haskell and Python.

Inari

--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gf-dev/228b2ac3-4abd-474d-a4dd-0bee223aabe6n%40googlegroups.com.

Hans Leiss

unread,
Mar 5, 2021, 1:07:11 PMMar 5
to gf-...@googlegroups.com
Hi Belal,

as I once wrote the comment in the AncientGreek you mention, here's the
rationale you missed: to define tree transformations, one would like to
do case distinction by pattern matching over the input tree. Pattern
matching must know what the constructors of, say, the abstract trees of
category NP are. For some reason, the abstract syntax of GF often uses
(or used when I played around with AncientGreek) fun-declarations like

fun DetCN : Det -> CN -> NP, not data : Det -> CN -> NP.

This implied that DetCN (and likewise DetQuant, PossPron) is not
recognized as a tree constructor in the (intended) tree pattern

pat1 = (DetCN (DetQuant (PossPron pers) num) cn)

in a definition like

def possAdj pat1 = t1 | ... | possAdj patn = tn,

and GF then treated DetCN in pat1 as a *variable* (leading to a syntax
error in the above case, I think). So to write tree transformation of
abstract syntax trees inside GF, the abstract syntax would have to use
data-, not fun-declarations for the syntax constructors. (I never
understood why it didn't, and think it's a pity that one can't do tree
transformation insinde GF any more, if I understand Inary correctly.)


Concerning your second question:

> In particular, what do the terms t1, t2, ...tm mean in the data judgement syntax?
>
> data f : A1 -> ... -> An -> C t1 ... tm

Here, I think, C t1 ... tm is just the general case of a *dependent*
category, (category C depending on objects or context t1 ... tm) as it
is used in the GF-book, p.135, in the declaration of LessZ:

cat Less Nat Nat ;
fun LessZ : (y:Nat) -> Less Zero (Succ y) ;

with
cat Nat ;
fun Zero : Nat ; fun Succ : Nat -> Nat ;

Best regards,
Hans

Belal

unread,
Mar 5, 2021, 7:49:38 PMMar 5
to Grammatical Framework

Thank you Inari and Hans for your explanations! 

So this is what I understood, please correct me if I'm wrong :)

1) "def" judgments are used to define transformations in abstract syntax trees (i.e. two trees - or subtrees - are equal)
2) In "def" judgments, we use pattern matching to define semantically equal trees.
3) In order to use tree constructors in pattern matching, the constructor needs to be defined with the judgement "data", otherwise, that constructor is interpreted as a variable.

----------------------

With regards to "transfer", I read that "transfer" is a reserved word for a module type in GF and it follows the syntax:

ModType -> transfer Ident : Open -> Open


Is this still the case the latest version of GF? Are there examples of this in the RGL? Or, is it no longer supported as per Inari's comment above? 

--------------------------------

Also, looking in the GF book, documentation, and tutorials online, I've noticed some conflicting usages of flags for the "put_tree" command in the GF shell when it comes to "transfer" or "transforming" trees at run-time. Sometimes, the flag "-transform" is used and other times "-transfer" is used:

Computation in GF is performed with the put_term command and the compute transformation, e.g.

> parse -tr "1 + 1" | put_term -transform=compute -tr | l 
 plus one one 
 Succ (Succ Zero) 
 s(s(0))


  > p -tr "John runs or Mary runs" | pt -tr -transfer=aggr | l 
ConjS (PredVP John Run) (PredVP Mary Run)
 PredVP (ConjNP John Mary) Run 
John or Mary runs  

(see end of Chapter 6 in GF book)

But, neither seem to be mentioned anymore when looking up the documentation of "put_tree" with "h pt" command:

> h pt
pt, put_tree

return a tree, possibly processed with a function

syntax:
  pt OPT? TREE

 -largest       sort trees from largest to smallest, in number of nodes
 -nub   remove duplicate trees
 -smallest      sort trees from smallest to largest, in number of nodes
 -subtrees      return all fully applied subtrees (stopping at abstractions), by default sorted from the largest
 -funs  return all fun functions appearing in the tree, with duplications

flags:
 -number        take at most this many trees

examples:
  pt -compute (plus one two)    --compute value

I can't find the -paraphrase flag either.

Considering the above, what is the currently supported way to perform "transfer" or run-time "transform" operations in the GF shell as per the latest version of GF? I understand that this can probably be done easier outside of GF in Python or some other supported language, but I would imagine that for testing purposes, it is more convenient to do it in the GF shell.  

Please advise. Thanks!

- Belal

Inari Listenmaa

unread,
Mar 5, 2021, 8:11:46 PMMar 5
to Grammatical Framework
As noted in this thread https://groups.google.com/g/gf-dev/c/n75o_XazllU/m/5-pJR49uDgAJ, the way that still works is this:

the put_tree command no longer seems to have the -transfer option, however. the only way I managed to have it work is by
Lang> p "she sees him"
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (UsePron she_Pron) (ComplSlash (SlashV2a see_V2) (UsePron he_Pron))))) NoVoc

Lang> pt -compute PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (active2passive (PredVP (UsePron she_Pron) (ComplSlash (SlashV2a see_V2) (UsePron he_Pron)))))) NoVoc | l
he is seen by her

i.e., I copied and pasted the parsed tree, adding `active2passive` at the proper place.

I just tested on my own computer on GF 3.10.4, and it still works.

Thanks for pointing out the inconsistencies in the documentation. I'd like to know if supporting this is intentionally deprecated, or just happened to be so accidentally. If intentional, I'll remove the traces of it in the documentation.

Inari

Hans Leiss

unread,
Mar 6, 2021, 3:54:33 PMMar 6
to gf-...@googlegroups.com
Hi Belal,

the idea behind "def" judgements is to enable computations with
gf-terms, as the gf-refman.html explains under "Conversions": besides alpha,
beta and eta conversions, there is is a further form of conversions:

> delta conversion: f a1 ... an = tg, if
> - there is a definition def f p1 ... pn = t
> - this definition is the first for f that matches the sequence a1 ... an,
> with the substitution g

This is more general than your 1) and 2): there is no restriction to
equality of trees. Your 3) is correct, except that instead of

data f : A1 -> ... -> An -> C ;

one can also write two separate declarations

fun f : A1 -> ... -> An -> C ;
data C = f ;

The put-tree command had an option -transfer up to gf-3.8 (defined among
allTreeOps in gf-3.8/src/compiler/GF/Command/TreeOperations.hs), but it
was removed in gf-3.9, along with the option -paraphrase.

I just checked the sources of the gf-rgl/src/abstract: only Noun,
Numeral, Sentence and Verb use data-, not fun-declarations for (some of)
their syntax constructions. It seems these are just those that are
needed for the two tree-transformations

active2passive : Cl -> Cl and digits2numeral : Card -> Card

in Transfer.gf and NumeralTransfer.gf.

Krasimir's comment on https://github.com/GrammaticalFramework/gf-rgl/pull/8

> The definition of the language demands that all functions defined as `data`
> must be in the same module as the `cat` definition for their result type.
> This is kind of like in Haskell.

> The RGL has one Cat module which defines all categories and then the
> functions are spread in several modules. This means that using `data` in
> the RGL is against the language specification. However, I am not sure that
> the compiler checks that restriction.

or rather the definition of the language is strange. Certainly, the
compiler does *not* check that restriction: the data declarations of the
RGL are in Noun, Numeral, Sentence, Verb, not in Cat, and the two
tree-transformations work.

I fear the rare use of data-declaration in the RGL discouraged people to
write tree transformations and let the developers think that the -transfer
should be removed, which I think is a pity.

Hans

P.S. At least years ago I did not want to change abstract/Noun.gf and
abstract/Adjective.gf of the RGL to define the tree transformations
possAdj : NP -> NP and partAP : AP -> AP in AncientGreek. Maybe I could
have just added

data NP = DetCN ; ...

and whatever is needed to TransferGrcAbs.

Aarne Ranta

unread,
Mar 7, 2021, 4:09:31 AMMar 7
to gf-...@googlegroups.com
Belal, Inari, Hans,

Thank you for opening the discussion about data, def, compute, and transfer! I must say that I lie a bit behind in understanding how these work in the latest GF implementation. So I had to do some experiments myself to see what works.

As always, I do think that backward compatibility is one of the essential values of GF, which we should not sacrifice for any reason, however valid it might look at some moment. It is the contract that developers have made with users, and it must not be broken.

However, the definition of backward compatibility must permit experiments and variations. So here is how I should define it:

- all the functionalities of the GF language defined in the reference manual, (on-line and GF book, Appendix C) must be maintained
- the published RGL API must be maintained, in particular the Syntax and Paradigms modules

At the same time,

- experimental additions to the GF language and GF shell, which are not mentioned in the reference manual, are not a part of the contract
- internals of the RGL, in particular the concrete linearization types, can be changed, since they are not accessible by the API
- run-time systems other than Haskell (including the GF shell) need not implement all functionalities (e.g. def and data)

With this in mind, and testing with the latest gf-core code, I notice that

- transfer is not a documented feature, and seems to have been removed. This is not a violation. I will write a few words about transfer further below.
- paraphrase is a feature mentioned in the GF book, Appendix E, and has been removed from the shell but it is still available in the PGF API. This is a borderline case, as it is a shell functionality and not a language feature. But I wonder why it was removed from the shell.
- data f : C, def, and pt -compute are supported as they should
- data C = f | g | ...  seems not to be supported any more: if  I write

    cat Bit ;
    fun Zero, One : Bit ;
    data Bit = Zero | One ;

  I get a mysterious error message:

- compiling Data.gf... 


Data.gf:19:2:

   conflicting information in module Data

    fun Zero : Bit ;

and


Languages:


  This is clearly a violation against the reference manual. As I will explain below, I think this construct was a mistake to add (data Zero : Bit is better), but not supporting it any more breaks a contract, and it should not be difficult to restore.

So now to the theory behind data, def etc. Hans already explained it correctly, but I would like to add some background, which probably most of GF programmers don't know.

The data model of the abstract syntax was inherited from ALF, a predecessor of the current Agda released in 1992. A description can be found in 


Unlike the present Agda, ALF was a "pure logical framework" , implementing MartinLöf's "higher-level type theory", with just the minimum of programming language constructs. I used it extensively in experiments with type-theoretical grammars before I started to develop GF in 1998. The reasons to start GF were (1) to formalize concrete syntax in a way that allowed the reversibility of linearization rules to parsing (2) to guarantee the continuity of type-theoretical grammars as ALF was going into deprecation in favour of an early version of Agda.

A *theory* in ALF - just like an *abstract syntax* in GF - is a definition of mutually inductive data types. I don't quite remember all the details of ALF's syntax, but here are the correspondences of the main concepts:

- declaring a new category C
  - ALF  C : Set
  - GF cat C
- declaring a constructor function c of type T
  - ALF  c : T C  (the letter C is a part of the syntax)
  - GF data c : T
- declaring some other function f of type T
  - ALF f : T I (the letter I is a part of the syntax, meaning "implicitly defined")
  - GF fun f : T
- defining a function f with pattern matching
  - ALF f ps = t
  - GF def f ps = t 

A fundamental difference from languages like Haskell (and Agda as far as I know) in both ALF and GF is that the constructors of a data type can be defined in several different modules. A datatype is hence "extensible", which permits just the kind of modularity that we want in writing large grammars and, in particular, lexica. Just think about how tedious it would be to write all functions whose value is C in the same module where "cat C" was declared!

This is why I think the format "data C = f | g | ..." was a mistake in the first place, and that "data f : C" is the right way to go. There should be absolutely no need to define all constructors of a data type in one and the same module.

The whole discussion is of course relevant only if one needs to care about the fun/data distinction - i.e. only when def definitions are used in GF. My feeling is that most GF users actually mean "data" when they write "fun". This is also reflected in the generation of Haskell modules from the abstract syntax: all "fun" declarations show up as data constructor declarations in the resulting Haskell code. And Haskell (or Python or Java) has conveniently replaced the need of GF-internal computation, paraphrasing, and transfer in many projects.

This said, I also agree that all functions in the RGL should be "data" rather than "fun". This would be an easy change in those modules, and I guess no-one would ever notice the change, except for those who have wanted it.

A summary of suggested actions:

- restore 'data C = f | g | ...'  because of backward compatibility, but change the refman by recommending not to use it
- restore 'pt -paraphrase'  because of backward compatibility?
- restore 'pt -transfer=f' ?
- change "fun" in RGL to "data" ?

Comments on this are welcome, as well as contributions!

Regards

  Aarne







 

































 


--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.

Hans Leiss

unread,
Mar 7, 2021, 2:40:27 PMMar 7
to gf-...@googlegroups.com
Hi Aarne,

thanks for your background explanations. Some comments:

> There should be absolutely no need to define all constructors of a
> data type in one and the same module.

I agree, mainly because of Extra/Extend (otherwise most constructors of
the RGL with the same result type appear in the same module already).

This implies that the (even more restrictive part of the) language
definition in C.3.6

However, all these constructor definitions (i.e. data C = f1 | ... | fn)
must appear in the same module in which the category is defined.

has to be removed: the categories are defined in Cat, the constructors
in other modules Adjective,...,Verb. Relaxing the language definition
should be tolerable; the restriction is a mistake and was never taken seriously.

> This is why I think the format "data C = f | g | ..." was a mistake in
> the first place, and that "data f : C" is the right way to go.

I'm not so sure, as I suspect there has been a reason for the data/fun
distinction in the abstract syntax of the RGL, though I never understood
which one. I always thought the abstract syntax ought to have data
declarations only, so that any transformation of abstract trees can be
defined (and the abstract syntax is a free algebra). But *if* there is a
reason to have second-class (i.e. fun-) syntax constructions, it must be
stated and explained somewhere. According to C.3.7, data- and def-
declarations are mutually exclusive, which implies GF-grammars can have
abstract syntax constructs with computation rules. I can see a
reason for these, though I guess nobody uses them so far.

Generic Example: Consider syntax constructs

fun F : A -> B ; data G1 : A1 -> A ; ... ; Gk : Ak -> A ;

We might want to define (lin F)(a) by case distinction over the
construction of a, but not lift the full lincat Ai to lincat A (by the
lin Gi), because, for example, most uses of A's can be done with part of
the lincat Ai's. Then we could define

def F (G1 a1) = f1 a1 ; ... ; F (Gk ak) = fk ak ;

and let lin f1, ..., lin fk have access to the full linearizations of
the *indirect* constituents a1, ..., ak of F's argument, while other
constructs

data F' : A -> B ;

(resp. lin F') might be happy with the (lesser) information in the
linearizations of their argument (G1 a1), ..., (Gk ak). We would then
get parse trees like F (G1 a), with

pt F (G1 a) = F (G1 a) and pt -compute F (G1 a) = f1 a.

The f1,...,fk would be some kind of auxiliary abstract syntax rules.

To repeat: we may want (lincat A) to contain as little information as
needed in most uses of A's, and in the rare cases where more information
is needed, get access to the information of the indirect constituents of
A's given by (lincat Ak).
[I checked that it works in GF, with an artifical example.] -- End of Example.

> The whole discussion is of course relevant only if one needs to care
> about the fun/data distinction - i.e. only when def definitions are
> used in GF. My feeling is that most GF users actually mean "data" when
> they write "fun".

Yes, but some users *might have* used fun's with def's in their abstract
syntax, so one cannot make

data f : A -> C ;

be the only kind of function declaration in the abstract syntax of
arbitrary GF-grammars.

> This is also reflected in the generation of Haskell modules from the
> abstract syntax: all "fun" declarations show up as data constructor
> declarations in the resulting Haskell code. And Haskell (or Python or
> Java) has conveniently replaced the need of GF-internal computation,
> paraphrasing, and transfer in many projects.

I can't tell if the "conveniently" is true, but I was always a bit
disappointed by this. If the syntax constructors of the RGL were
data-declarations, one could write tree transformations inside GF, some
of which could be useful in many languages. For example, I guess
transformations like

the last three papers of mine <-> my last three papers

work for several european languages, and simplifying tree
transformations like removing subjunctive clauses ((S, because S') => S)
also do. To my feeling, computations with abstract syntax trees belong
to the core tool GF, and should not be outsourced to (application)
projects (and repeated in different projects). [Assuming the GF-internal
computation is sufficiently efficient and convenient.]

> A summary of suggested actions:

> - restore 'data C = f | g | ...' because of backward compatibility,
> but change the refman by recommending not to use it

I would omit the recommendation, but instead say that most abstract
grammars will want to use declarations "data f : C" instead of "fun f :
C" for all their constructions, since they don't have any need for
syntax constructions with computation rules "def f p1 ..pn = t".
[Provided the RGL does.]

> - restore 'pt -paraphrase' because of backward compatibility?
> - restore 'pt -transfer=f' ?

I think they should both be restored (sorry for those who have to do the
work). I disagree with the difference you see between the two:

> - transfer is not a documented feature, and seems to have been removed. This is not a
> violation. I will write a few words about transfer further below.
> - paraphrase is a feature mentioned in the GF book, Appendix E, and has been removed from the
> shell but it is still available in the PGF API. This is a borderline case, as it is a shell
> functionality and not a language feature. But I wonder why it was removed from the shell.

First, -transfer is also mentioned in the GF book, in Section 8.8
(p.195, though not mentioned in the index p.330) at least. Appendix E
does not mention -transfer, but says (p.308)

The following list of commands is a subset of the full set, but it
includes all commands used in this book, giving some representative
example of each.

Since "pt -transfer" is used in the book, p.195, readers may want to
test it, so it should be restored, even it is not among the
"representative" examples "pt -compute" and "pt -paraphrase".

Second, "pt -transfer=f" is useful in pipes, to apply f to a tree you
don't know yet, like

parse "your favorite sentence" | pt -transfer=f

and I don't see how you could mimick this with "pt -compute". If you
want to write transfer functions, this piping seems the best way to test
them.

> - change "fun" in RGL to "data" ?

First, it is possible: by C.3.5,

The set of def definitions for f can be scattered around the module in
which f is introduced as a function.

and only there, I understand. In the RGL, no abstract module contains a
"fun f : C" *and also* a "def f p = t", so a fun without def differs
from a data only in being viewed as a variable in patterns, and I see no
good reason why one would want this strange behaviour.

Second, it seems the idea was that the abstract syntax of the RGL is a
free algebra. Then one should be able to do pattern matching with trees,
at least if computation with trees inside GF is intended.

"pt -compute t" has to distinguish between symbols f in t according to
"f having a def", "f being a data constructor", and "f being a variable"
anyway. Having more data constructors gives more possible patterns in
def-definitions, and if there are too many def's are around, "pt
-compute" might become slower that it is now (on new trees).

Q: Can the change form fun to data have effects for dependent types and
their equality?

Regards,

Hans

Aarne Ranta

unread,
Mar 9, 2021, 9:06:57 AMMar 9
to gf-...@googlegroups.com
Hello Hans,

I think we are in perfect agreement about the main points:

1. data/fun distinction should be kept, since it makes a difference in how pattern matching in def definitions is interpreted
2. RGL should switch into data everywhere, since in this way one could define transfer by pattern matching on the RGL functions
3. data should be allowed to be split to different modules (as it is now for the judgement form 'data f : T')

And thanks for pointing out that 'pt -transfer' indeed occurs in the GF book, even if not in the refman. This is a good reason to re-enable it.

Your example with

  fun F : A -> B ;  data G1 : A1 -> A ; ... ; Gk : Ak -> A ;

is interesting. I am trying to figure out a concrete use case. What comes to my mind is

  fun UseComp' : Comp -> VP ; 
  data CompAP : AP -> Comp ; CompAdv : Adv -> Comp ; ...

where UseComp' would be a variant of UseComp that is different for AP, Adv, etc. One example that comes to my mind is the selection of copula ser/estar in Spanish, which can however also be taken care of by a parameter of Comp, as it is now.

Finally, data/fun vs dependent types: yes, this is crucial, because two types (C t) and (C u) can only be equal if  (t = u) holds, and this is checked by computing them towards canonical form, i.e. data form. If t and u have different data constructors as heads, the equality fails.

Regards

  Aarne












--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.

Hans Leiss

unread,
Mar 10, 2021, 11:47:26 AMMar 10
to gf-...@googlegroups.com
Hi Aarne,

yes, we agree on 1.-3.,

> 1. data/fun distinction should be kept, since it makes a difference in how
> pattern matching in def definitions is interpreted
> 2. RGL should switch into data everywhere, since in this way one could define
> transfer by pattern matching on the RGL functions
> 3. data should be allowed to be split to different modules (as it is now for
> the judgement form 'data f : T')

and on 3. not only because of Extend, as I thought, but also because
of Lexicon, as you said. (But on 2. see the Remark below).

Besides that, I only now realised the following distinction of functions in the
abstract syntax:

a) transfer functions f : tree -> tree, which do not appear in parse trees:

fun f : A -> C with def f pat1 = t1 | ... f patn = tn
but *without* linearization in concrete grammars

b) computable f : tree -> tree that may appear in parse trees:

fun f : A -> C with def f pat1 = t1 | ... | f patn = tn
but *with* linearization (lin f) in concrete grammars

Those under a) (like active2passive, digits2numeral, aggregate), remind me of
Chomsky's transformational grammar, in so far as he claims that relations between
constructions are an important part of natural language grammars --similar to
equivalences between formulas in predicate logic; maybe Chomsky made a stronger
claim and wanted to use active2passive to save writing passive constructions.

Those under b) give us a (restricted?) way to linearize by case
distinction over the abstract syntax of type A in the sense of

lin (f a) = case a of { pat1 => lin t1 | ... |
patn => lin tn | x => (lin f)(lin x) }

*if* we use

pt -compute (f a) | linearize

rather than

linearize (f a) = (lin f)(lin a),

the default for linearizing (f a). Maybe your idea with UseComp can be treated this way.

What I had in mind was related to pronouns: these can *often* be treated as if
they were full noun phrases, but *not always*.

1. If we consider personal and possessive pronouns als forms of a single Pron,
then UsePron : Pron -> NP only works for the substantive usage of NPs. The
(det/adjectival) possessive usage of NPs is expressed by the Saxon genitive:

PossNP : NP -> CN -> NP -- Bill's car | a young girl's sweet dreams |
-- the greengrocers' apples |
-- the greengrocer's apostrophe ("trouser's reduced")

As there is no genitive of pronouns in Eng, the (def/adjectival) possessive
usage of Prons has to be expressed by special forms:

PossPron : Pron -> CN -> NP -- his car | her sweet dreams | their apples

So I had in mind that the (linearization of) UsePron : Pron -> NP could omit
the possessive forms of (lincat Pron), while the rare(?) possessive usage would
need to have access to them. So, the general rule should be

fun Poss : NP -> CN -> NP ;
def Poss (UsePron p) cn = PossPron p cn | Poss np cn = PossNP np cn ;

Note 1: this is a variation of "def F (G1 a1) = f1 a1 | F (G2 a2) = f2 a2" of
my previous message, with a final else-case: F a = f a. But in contrast to
what I said there, f1, f2 and f are not auxiliary constructors, but (probably)
data construcors, while F is auxiliary. Done this way, F is just a transfer
function, a case of a) above, and will not appear in parse trees (bad).

Note 2: a slightly different implementation is made in PronNPGer below.
Identifing Poss with PossNP, we can give PossNP a linearization to handle the
default (using np.s!Gen) and give it a computation rule (without else-case for
the default) to handle exceptions:

fun PossNP : NP -> CN -> NP ;
def PossNP (UsePron p) cn = PossPron p cn ; -- exception

This makes PossNP an instance of b) above: it appears in parse trees of a
non-pronoun np (good) and incorrectly uses the genitive form of a personal
pronoun (bad), but it can no longer be a data constructor (bad).

Remark: If all constructions (fun F) in the RGL are turned into (data F), we
cannot add computation rules (def F pat) to add corrections for special cases.
This may be an argument against turning all fun to data. A way out would be to
improve an existing (data F) of the RGL by adding a new (fun F') with
(def F' pat) for exception-cases, and use F in the else-case of F'.

2. As in the RGL one can use a flag isPron:Bool in (lincat NP), to *modify* the
NP-linearization for a Pron, but one cannot *exclude* a Pron from being
used in a rule intended for non-pron NPs. [The correct basic distinction is
between Pron and NonPron, and NP is just a simplifying category for (Pron |
NonPron) that saves duplicating common rules, with a price paid by admitting
any np in non-common rules.] Using abstract functions of type b) above reduces
this price, I think.

We sometimes have to implement special cases, which do not necessarily
have to do with access to rarely used information in linearizations.
For example, even if Pron is just the personal pronoun, ditransitive
verbs want to do a "pronoun switch":

besides the acc < (to-)dat ordering
I gave (the book:acc) (to the girl:dat) [ComplAccDatV3]
I gave (it:acc) ((to) her:dat)
one can say
I gave (the girl:dat) (the book:acc) [ComplDatAccV3]
I gave (her:dat) (the book:acc)
but it is wrong to say (with unstressed "it")
* I gave (the girl:dat) (it: acc)
* I gave (her:dat) (it:acc)

I'm not absolutely sure about the details, but at least the (it:acc) must be put
before a dative object without "to". If we had a ternary VP-construction

fun ComplDatAccV3 : V3 -> NP -> NP -> VP,

for putting the indirect object (without "to") before the direct one, one
could implement the switching of the acc-pronoun in front by a computation
rule

def ComplDatAccV3 v np (UsePron q) = ComplAccDatV3 v (UsePron q) np

and use the default-behaviour of ComplDatAccV3 only for non-pronoun direct objects.

To implement this pronoun switch in GF, using the isPron flags in the
NP-linearizations, is a bit cumbersome. GF embeds first one of the objects, by

Slash2V3 : V3 -> NP -> VPSlash ; -- give it (to her)
Slash3V3 : V3 -> NP -> VPSlash ; -- give (it) to her

and later the other one, by

ComplSlash : VPSlash -> NP -> VP ; -- love it

Even though the lincat of the vps:VPSlash constructed in the first step
remembers whether the direct or the indirect object is still missing, it has
glued the string np1.s of the np1 it got into the string in vps.s (as far as I
remember), and hence, np1.s cannot be replaced by np2.s in (ComplSlash vps np2).
[Sorry, this may be wrong since you use an nn-field in vps, but something like this
happend with objects of embedded infinitives I needed to move around,
or it had to do with remembering in vps whether np1 was a pronoun.]

Two years ago, I had implemented a similar pronoun-switch in VerbGer, using a field

nn : Agr => Str * ... * Str

that kept inserted objects/complements in separate fields

<relfl or pron, other np, pp, compls, obj.of infinitives, embedded infinitives>

It can be done, but adding one or two boolean flags in (lincat NP) (isPron,
isHeavy, isDefinite, ..) gave a considerable increase of the compiled grammar
(too many Slash(2|3)V3 rules, I think), and finally made compilation
impossible on my notebook.

I wonder whether constructions with computation rules would lead to an easier
implementation, as sketched under 1. above. In the RGL case, one would need
patterns that look deeply into the argument vp of

PredVP : NP -> VP -> Cl

to see whether this vp was constructed using (UsePron p) as np-argument of
embedded (Compl np vp)s and (Slash*V3 v np)s.

Other examples of such exceptions might be to exclude misuses of (PredVP np
vp) in which the vp contains a reflexive that does not agree with the np --
but no, this cannot be blocked on the abstract level.

Summary: writing grammar rules by case distinction on the abstract syntax is
to some extent possible in GF, and maybe useful for exceptional cases. (I
always missed this.) [All this needs "pt -compute", as explained in b) and
shown in the PronNPGer example below.]

The price, you may say, is non-compositionality: the linearization of a tree t
is then not computed from the linearization of t's *direct* subtrees, but from
the linearizations of *indirect* subtrees (or trees built from those) instead.

Sorry, this got a bit long, to be as precise as is needed.

Hans

P.S. The command "pt -transfer=f tree" internally used -compute to go down to
subtrees that match one of the patterns of "def f pat1 = t1 | ... | f patn =
tn". The tree t resulting from a parse typically does not match these patterns:
t may be an Utt, but active2passive : Cl -> Cl applies to clauses. Even if we
are given a tree t, we don't want to insert f in front of suitable subtrees by
hand and then use -compute.

------------------------- Example ------------------------------- 10.3.2021 HL

--------------------------------PronNP.gf ------------------------------------

abstract PronNP = {

flags
startcat = NP ;
cat
Det ; CN ; Pron ; NP ;
data
DetCN : Det -> CN -> NP ;
UsePron : Pron -> NP ;
PossPron : Pron -> CN -> NP ;
fun
PossNP : NP -> CN -> NP ; -- default: des Sohnes Hund |
def -- *meiner Hund
PossNP (UsePron p) cn = PossPron p cn ; -- exception: mein Hund

-- Lexicon
data
Der : Det ;
Hund, Sohn : CN ;
Ich : Pron ;

}

--------------------------------PronNPGer.gf ---------------------------------

concrete PronNPGer of PronNP = {

lincat
NP = { s : Case => Str } ;
Pron = { s : Case => Str ; poss : AForm => Str } ;
CN, Det = { s : Case => Str } ; -- simplified
lin
DetCN det cn = { s = \\c => det.s!c ++ cn.s!c } ;
UsePron p = { s = p.s } ; -- p as personal pronoun
PossPron p cn = { s = \\c => p.poss ! (AF c) ++ cn.s ! c } ;

PossNP np cn -- default : possessive genitive, except for np=pron
= { s = \\c => np.s ! Gen ++ cn.s ! c } ;


Der = { s = table { Nom => "der" ; Gen => "des" ;
Dat => "dem" ; Acc => "den" } };
Ich = { s = table { Nom => "ich" ; Gen => "meiner" ;
Dat => "mir" ; Acc => "mich" };
poss = table AForm { AF Nom => "mein" ; AF Gen => "meines" ;
AF Dat => "meinem" ; AF Acc => "meinen" }
} ;
Hund = { s = table { Nom|Acc => "Hund" ; Gen => "Hundes" ;
Dat => "Hunde" } } ;
Sohn = { s = table { Nom|Acc => "Sohn" ; Gen => "Sohnes" ;
Dat => "Sohne" } } ;

param
Case = Nom | Gen | Dat | Acc ;
AForm = AF Case ; -- short for: AF Gender Number Case

}

--------------- Parsing, linearization, and tree normalization ----------------------

PronNP> p -tr "der Sohn" | l -table
DetCN Der Sohn

s Nom : der Sohn
s Gen : des Sohnes
s Dat : dem Sohne
s Acc : den Sohn

PronNP> p -tr "ich" | l -table
UsePron Ich

s Nom : ich
s Gen : meiner
s Dat : mir
s Acc : mich

PronNP> p -tr "des Sohnes Hund" | l -table
PossNP (DetCN Der Sohn) Hund

s Nom : des Sohnes Hund
s Gen : des Sohnes Hundes
s Dat : des Sohnes Hunde
s Acc : des Sohnes Hund

-- incorrect input, accepted by default construction:
PronNP> p -tr "meiner Hund" | l -table
PossNP (UsePron Ich) Hund

s Nom : meiner Hund
s Gen : meiner Hundes
s Dat : meiner Hunde
s Acc : meiner Hund

-- correct input, accepted by special construction:
PronNP> p -tr "mein Hund" | l -table
PossPron Ich Hund

s Nom : mein Hund
s Gen : meines Hundes
s Dat : meinem Hunde
s Acc : meinen Hund

-- incorrect input, translated to correct output by tree normalization (pt -compute):
PronNP> p -tr "meiner Hund" | pt -compute -tr | l -table
PossNP (UsePron Ich) Hund

PossPron Ich Hund

s Nom : mein Hund
s Gen : meines Hundes
s Dat : meinem Hunde
s Acc : meinen Hund

-- normalization changes indirect subtrees (only):
PronNP> p -tr "meiner Sohnes Hundes Sohn" | pt -compute -tr | l -table
PossNP (PossNP (PossNP (UsePron Ich) Sohn) Hund) Sohn

PossNP (PossNP (PossPron Ich Sohn) Hund) Sohn

s Nom : meines Sohnes Hundes Sohn
s Gen : meines Sohnes Hundes Sohnes
s Dat : meines Sohnes Hundes Sohne
s Acc : meines Sohnes Hundes Sohn

-- normalization leaves non-possessive use of pronouns intact:
PronNP> p -tr "ich" | pt -compute -tr | l -table
UsePron Ich

UsePron Ich

s Nom : ich
s Gen : meiner
s Dat : mir
s Acc : mich

----------------------------- Example. ----------------------------------------

Hans Leiss

unread,
Mar 14, 2021, 12:50:13 PMMar 14
to gf-...@googlegroups.com
I have to correct the final sentence of my

> Remark: If all constructions (fun F) in the RGL are turned into (data F), we
> cannot add computation rules (def F pat) to add corrections for special cases.
> This may be an argument against turning all fun to data. A way out would be to
> improve an existing (data F) of the RGL by adding a new (fun F') with
> (def F' pat) for exception-cases, and use F in the else-case of F'.

Suppose we change an existing (fun F : A -> C ) in the RGL to (data F : A -> C)
and add a new abstract function

fun F' : A -> C; def F' exception-pat = t | F' else = F else ;

Then any input with a parsetree a:A matching the exception-pattern would
--besides (F' a)-- also have a parse tree (F a) : C, and "pt -compute (F a)"
would leave this (unwanted) parse tree (F a) unchanged. [We would get

lin (F' a) = lin t, instead of (lin F')(lin a),

but not 'correct' (lin (F a)) to (lin t).]

So, unfortunately, refining existing abstract functions F : A -> C as I
intended can only be done by adding computation rules *for F itself*,

def F exception-pat = t ; [ giving (lin F exception) = lin t,
(lin F else) = (lin F)(lin else) ]

which means that F must have a fun-declaration, not a data-declaration.
This is possible with the existing RGL, but no longer when change
2. below is made.

Hans
Reply all
Reply to author
Forward
0 new messages