Hi,
I'm trying to use ud2gf in a project, and I'm running into some issues, as well as general questions.
Since this post is rather long, I have marked my actual questions with [QUESTION]. If you can answer just one question, it is much appreciated. (Feel free to also comment on things that are not explicitly questions, if you want to.)
0. Background: what am I trying to do?
I want to experiment with ud2gf as follows:
From a large corpus, automatically derive an abstract syntax: an embedded application grammar on top of the very core bits of the RGL.
Let me give an example. Suppose we have a UD tree like this:
4 small small ADJ 0 root
2 cat cat NOUN 4 nsubj
1 the the DET 2 det
3 is be AUX 4 cop
Now, assume that I'm running ud2gf with just the standard Lang* from the RGL. One of the intermediate steps of ud2gf is as follows:
[4] small 4 (3) ADJ root (small_A : A[4] ; ComparA small_A : AP[4])
[1,2] cat 2 (1) NOUN nsubj (DetN the_Det (UseN cat_N) : NP[1,2])
[1] the 1 (2) DET det (the_Det : Det[1])
[3] is 3 (2) AUX cop (??????)
Same step, but now I switch to using my new grammar. The UD relations become GF categories in my grammar—more details right after the example.
[4] small … root (small_A : A[4] ; … ; rootA_ (PositA small_A) : root[4])
[1,2] cat … nsubj (DetN the_Det (UseN cat_N) : NP[1,2] ; nsubj_ (DetN the_Det (UseN cat_N) : nsubj[1,2])
[1] the … det (the_Det : Det[1]) -- uninteresting, because not immediate child of root
[3] is … cop (be_cop : cop[3])
Those trees in boldface, they have the same category as the dependency label of the word in the original sentence. This is the starting point of my grammar—structures of the form "root and its immediate children".
Every UD tree that I find in my corpus, that differs on the level of root's immediate children, will become the type signature of a corresponding GF function. If my corpus contains "the cat is small", or "the black cat is small", that structure will become the following GF function:
root_nsubj_cop : root -> nsubj -> cop -> UDS ;
In addition, I will have a (manual?) list of coercions and (language-specific) function words, that could look like this:
root_nsubj_cop : AP -> root ; -- same for rootV, rootN_ …
nsubj_ : NP -> nsubj ;
be_cop : cop ;
have_aux : aux ;
I would like to use the core RGL as much as possible, so I don't need to have different "UD layer" functions for "a cat", "my three cats" and whatnot. In the picture below is the AST (-nofuns) for "the cat is small", and I would like to ignore everything under the line "root nsubj cop".
1. Nonlocal definitions
Adding labels for my custom funs was trivial (root_nsubj_cop has the labels "head nsubj cop"), so I started parsing happily. It could parse "the cat is small", but it couldn't do "my hovercraft is full of eels". I found out why: DetCN had the following annotation
#fun DetCN det head -- Det -> CN -> NP
And "my" is not a det, but an nmod:poss.
A generalized form of this is a **nonlocal annotation**, such as #fun AdvVP _ PrepNP > head obl
I wrote this in my labels file, but no luck.
#fun DetCN _ PossPron > nmod:poss head -- my hovercraft
#fun DetCN det head
In the same document, I also saw the #change syntax:
#change det > nmod:poss features Poss=Yes|PronType=Prs
That didn't work either, and gfud check-annotations even complained about bad syntax.
Alternative function: #altfun
#altfun ComplV2 head obl
This is needed in ud2gf for reading normal UD, because the complement of a V2 verb can be labelled either obj of obl depending on the case governed by the verb.
So I tried putting this in my labels file:
#fun DetCN det head
#altfun DetCN nmod:poss head
But nothing happens, and gfud check-annotations gives "unknown function '#altfun'".
[QUESTION]
* What is the current syntax for doing nonlocal abstract mappings?
* Are some/all of the syntaxes correct, but ud2gf just doesn't support nonlocal abstract mappings?
2. Do I need #morpho mappings?
[QUESTION]
a) Do I need to add #morpho mappings?
b) If so, can it be automated?
…
z) Why are they important?
3. Do I need #auxfun?
My hope was that I could get away with as few "weird" extra functions as possible. I will add aux, cop and such closed classes into my grammar, but I would ideally not want to deal with specialised versions of syntactic functions. My goal is not to produce RGL trees, so all those extra clutter of closed classes may stay in my trees.
Suppose that I'm parsing a sentence like "knowing that you didn't drink tea"–I don't even care if I lose the gerund in know, and the tense and polarity in you drink tea. If I have a subtree as a RGL Cl or VP, I can always re-introduce whatever inflections I need later.
(One thing I do care about is that my APs don't have superfluous ComparA or OrdSuperl–I don't want "full of eels" to be randomly changed into "fullest of eels".)
With this information about my needs, [QUESTION] do you think I need to figure out how to use #auxfun?
Sorry about the long post—if there is documentation somewhere that I have missed, please direct me to it. If there is no such documentation, I'll be glad to contribute to improving the current docs.
Cheers,
Inari