lupu...@gmail.com writes:
> It depends on the understanding of a "word". Let's consider three
> sentences:
> (1) I crossed the street.
> (2) I have crossed the street.
> (3) I shall cross the street.
>
> From my point of view all three are syntactically identical: S + V +
> O, and therefore they must have the same dependency tree.
> By some historical reasons the verb in (1) is written without spaces
> while in (2) and (3) it contains spaces. But it's not matter of
> syntax, it's matter of orthography only.
You can put words in between them, and this works productively:
I have not ever crossed the street.
I shall not ever cross the street.
I will not ever cross the street.
You can utter them in isolation:
I have.
I shall.
I will.
A native speaker asked to read slowly will pause before and after "have"
(but not between "cross" and "ed"). And I'm sure a phoneticist will be
able to point out the phonetic boundaries, how they correspond with
other things we call words in the language, but not with the within-word
contour.
Word boundaries can be a difficult question some times, but not in this
case.
[…]
> But it's only my opinion and your mileage may vary.
> All that I propose: to add to CG3 a syntactic constructions permitting
> to read wordform of the cohort and baseform of the reading found by
> current contextual test.
You can split it into three modules:
First you have a vislcg3 file containing rules like
LIST merge-if-before-verb = "<shall>" "<will>" "<have>" ; # etc
ADD (mergeright) merge-if-before-verb (1 (verb)) ;
Then a script (awk/perl/what have you) to turn
"<will>"
"will" verb pres mergeright
"<cross>"
"cross" verb inf
into
"<will cross>"
"will cross" verb pres verb inf
or whatever you feel that should be, and then your dependency CG.
Assuming input to the first module is disambiguated
(one-reading-per-cohort), that's a rather simple script.