Unification in complex contexts

42 views
Skip to first unread message

Atro Voutilainen

unread,
Mar 6, 2015, 7:11:25 AM3/6/15
to constrain...@googlegroups.com, Atro Voutilainen
Hi,

I wrote a rule to SELECT two coordinated finite verb readings (on the basis of a uniqueness generalisation), but the rule disambiguated only the first one (the target). The grammar looks something like this:

LIST vfin = (V Pres) (V Imp) (V Past) ;

SELECT $$vfin IF (*-1C clb BARRIER vfin) (*1C (CC) BARRIER vfin OR clb LINK 1 $$vfin LINK *1C clb BARRIER vfin) ;

Is this something to look at? I can provide a concrete example if needed.

Best,
Atro

Tino Didriksen

unread,
Mar 6, 2015, 7:19:16 AM3/6/15
to constrain...@googlegroups.com
SELECT can only affect 1 cohort at a time. If you want to also remove readings from the paired cohort, you need another SELECT rule that searches backwards.

-- Tino Didriksen

Atro Voutilainen

unread,
Mar 6, 2015, 7:22:19 AM3/6/15
to constrain...@googlegroups.com
Thanks!
Atro


--
You received this message because you are subscribed to the Google Groups "Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraint-gram...@googlegroups.com.
To post to this group, send email to constrain...@googlegroups.com.
Visit this group at http://groups.google.com/group/constraint-grammar.
For more options, visit https://groups.google.com/d/optout.

JOSE MARIA ARRIOLA

unread,
Aug 31, 2016, 11:18:11 AM8/31/16
to constrain...@googlegroups.com, ARRIOLA EGURROLA, JOSE MARIA

Hi,
when applying the CG3 disambiguation grammar for Basque I have got the following message:
Warning: Hard limit of 500 cohorts reached at line 2.219 - forcing
break.

What does it mean?

Thank you very much.

Jose Mari Arriola


Tino Didriksen <tino.di...@gmail.com> erabiltzaileak idatzi du:

Tino Didriksen

unread,
Aug 31, 2016, 11:24:21 AM8/31/16
to constrain...@googlegroups.com, ARRIOLA EGURROLA, JOSE MARIA
It means that in 500 words of the input, none matched the Delimiters you defined in the grammar.

This usually means the input is malformed somehow (very long run-on sentences), or your delimiters are underspecified.

If there truly are no valid hard delimiters, you can use Soft-Delimiters to introduce some sort of structure (usually commas or similar soft line-of-thought breaks).

-- Tino Didriksen

JOSE MARIA ARRIOLA

unread,
Sep 1, 2016, 6:06:46 AM9/1/16
to constrain...@googlegroups.com
Ok. Thank you very much.
JM
Tino Didriksen <tino.di...@gmail.com> escribió:

> It means that in 500 words of the input, none matched the Delimiters you
> defined in the grammar.
>
> This usually means the input is malformed somehow (very long run-on
> sentences), or your delimiters are underspecified.
>
> If there truly are no valid hard delimiters, you can use Soft-Delimiters to
> introduce some sort of structure (usually commas or similar soft
> line-of-thought breaks).
>
> -- Tino Didriksen
>
> On 31 August 2016 at 16:59, JOSE MARIA ARRIOLA <josemari...@ehu.eus>
> wrote:
>
>> Hi,
>> when applying the CG3 disambiguation grammar for Basque I have got the
>> following message:
>>
>> *Warning: Hard limit of 500 cohorts reached at line 2.219 - forcing break.*
>> What does it mean?
>>
>> Thank you very much.
>>
>> Jose Mari Arriola
>>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Constraint Grammar" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to constraint-gram...@googlegroups.com.
> To post to this group, send email to constrain...@googlegroups.com.
> Visit this group at https://groups.google.com/group/constraint-grammar.

Edward Garrett

unread,
Sep 1, 2016, 6:11:34 AM9/1/16
to Constraint Grammar
I have run into this issue myself. Can the hard limit be changed? If so, does performance diminish considerably as this limit is increased?

I ask because it can be convenient for some purposes, and some languages or registers, to avoid defining the "sentence", and to break texts into considerably larger chunks.

On Thu, Sep 1, 2016 at 11:06 AM, JOSE MARIA ARRIOLA <josemari...@ehu.eus> wrote:
Ok. Thank you very much.
JM
Tino Didriksen <tino.di...@gmail.com> escribió:

It means that in 500 words of the input, none matched the Delimiters you
defined in the grammar.

This usually means the input is malformed somehow (very long run-on
sentences), or your delimiters are underspecified.

If there truly are no valid hard delimiters, you can use Soft-Delimiters to
introduce some sort of structure (usually commas or similar soft
line-of-thought breaks).

-- Tino Didriksen

On 31 August 2016 at 16:59, JOSE MARIA ARRIOLA <josemari...@ehu.eus>
wrote:

Hi,
when applying the CG3 disambiguation grammar for Basque I have got the
following message:

*Warning: Hard limit of 500 cohorts reached at line 2.219 - forcing break.*
What does it mean?

Thank you very much.

Jose Mari Arriola


--
You received this message because you are subscribed to the Google Groups "Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraint-grammar+unsubscribe@googlegroups.com.
To post to this group, send email to constraint-grammar@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to constraint-grammar+unsubscribe@googlegroups.com.
To post to this group, send email to constraint-grammar@googlegroups.com.

Tino Didriksen

unread,
Sep 1, 2016, 6:18:26 AM9/1/16
to constrain...@googlegroups.com
You can set your own limits with the options:
     --soft-limit           number of cohorts after which the SOFT-DELIMITERS kick in; defaults to 300
     --hard-limit           number of cohorts after which the window is forcefully cut; defaults to 500

Performance will depend on what kind of rules you have. If they'll indiscriminately scan the whole "sentence" that can take a while. If they're limited to closer contexts, it makes no difference how big the sentence is.

-- Tino Didriksen

JOSE MARIA ARRIOLA

unread,
Sep 19, 2016, 11:55:38 AM9/19/16
to constrain...@googlegroups.com

Hi,
After being exploring the differences between the application of the grammar by itself and the grammar inside of the module for tagging. We have observed a problem just inside the module of tagging. The thing is that there are  some offset marks that are included with the delimiters, for instance:
"<$.>"<PUNT_PUNT#7824-7824#p47#>" instead of   "<$.>"<PUNT_PUNT>"
#7824-7824#p47# tag includes information about line number and paragraph number. The problem is that we get the following message:


Warning: Hard limit of 500 cohorts reached at line 2.219 - forcing
break.

It is due to the fact that the delimiters are not recognized.  We define the delimiters in our grammar in a static way. Is it possible to define the delimiters by means of a regular expression?

Thank you,


Jose Mari Arriola

Tino Didriksen <tino.di...@gmail.com> erabiltzaileak idatzi du:

> --
> You received this message because you are subscribed to the Google
> Groups "Constraint Grammar" group.
> To unsubscribe from this group and stop receiving emails from it,

> send an email to constraint-gram...@googlegroups.com.
> To post to this group, send email to constrain...@googlegroups.com.

Tino Didriksen

unread,
Sep 19, 2016, 12:51:40 PM9/19/16
to Constraint Grammar, josemari...@ehu.eus
Instead of embedding the information into the tag itself, you can put it as static information on the cohort, a'la:
"<$.>" PUNT_PUNT#7824-7824#p47#

That is, the wordform followed by a space and then any tags you want. The caveat is that the tags will be visible to rules, but if they're esoteric enough then it won't matter.

But yes, you can also use regex for delimiters, if you so desire.

-- Tino Didriksen

JOSE MARIA ARRIOLA

unread,
Sep 19, 2016, 4:14:42 PM9/19/16
to constrain...@googlegroups.com
Thank you very much, Tino.
Jose Mari
Tino Didriksen <Tino.Di...@gmail.com> erabiltzaileak idatzi du:

> Instead of embedding the information into the tag itself, you can put it as
> static information on the cohort, a'la:
> "<$.>" PUNT_PUNT#7824-7824#p47#
>
> That is, the wordform followed by a space and then any tags you want. The
> caveat is that the tags will be visible to rules, but if they're esoteric
> enough then it won't matter.
>
> But yes, you can also use regex for delimiters, if you so desire.
>
> -- Tino Didriksen
>
> On Monday, 19 September 2016 17:55:38 UTC+2, JOSE MARIA ARRIOLA wrote:
>>
>> Hi,
>> After being exploring the differences between the application of the
>> grammar by itself and the grammar inside of the module for tagging. We have
>> observed a problem just inside the module of tagging. The thing is that
>> there are some offset marks that are included with the delimiters, for
>> instance:
>> *"<$.>"<PUNT_PUNT#7824-7824#p47#>"* instead of "<$.>"<PUNT_PUNT>"
>> *#7824-7824#p47#* tag includes information about line number and
>> paragraph number. The problem is that we get the following message:
>> *Warning:* Hard limit of 500 cohorts reached at line 2.219 - forcing
>> break.
>> It is due to the fact that the delimiters are not recognized. We define
>> the delimiters in our grammar in a static way. Is it possible to define the
>> delimiters by means of a regular expression?
>>
>> Thank you,
>> Jose Mari Arriola
>>
>
Reply all
Reply to author
Forward
0 new messages