combo hangs on negation sometimes

26 views
Skip to first unread message

paul

unread,
Aug 5, 2013, 9:29:36 PM8/5/13
to chib...@googlegroups.com
I'm trying to rerun some combo searches that ran successfully a year ago but
haven't been used since then. I've observed identical behavior on Windows XP,
Ubuntu 12.04, and Arch Linux with CLAN 05-Aug-2013 and the version before it.

It seems that combo is hanging when encountering the negation operator "!" in
certain contexts. For example:

  combo +t'*CHI' +t%mor +t%xgra +s'!*:wh|*^!*?' +d1 ~/corpora/childes/Valian/01a.cha

is intended to filter out utterances containing wh questions, although it's
unclear to me exactly how to parse that search string (I didn't write it).

The same thing happens on a simpler combo line like

  combo @ +t'*CHI' +t%mor +t%xgra +s'!xxx' +d1 ~/corpora/childes/Valian/01a.cha

though I realize this could be rewritten with kwal.

In both cases combo never gets past

  combo +t*CHI +t%mor +t%xgra +s!xxx +d1 /home/paul/corpora/childes/Valian/01a.cha
  Mon Aug  5 20:43:10 2013
  combo (05-Aug-2013) is conducting analyses on:
    ONLY speaker main tiers matching: *CHI;
      and those speakers' ONLY dependent tiers matching: %MOR; %XGRA;
  ****************************************
  From file <01a.cha>

After poking around a little with gdb and enabling the debug print statement in
combo.cpp:findmatch I get

  combo +t*CHI +t%mor +t%xgra +s!xxx +d1 /home/paul/corpora/childes/Valian/01a.cha
  Mon Aug  5 20:54:54 2013
  combo (05-Aug-2013) is conducting analyses on:
    ONLY speaker main tiers matching: *CHI;
      and those speakers' ONLY dependent tiers matching: %MOR; %XGRA;
  ****************************************
  From file <01a.cha>
  1; pat=xxx;wild=0;origmac->neg=1;txt=tape it up and two tape players .       %mor: v|tape pro|it adv:loc|up coord|and det:num|two n|tape n|play&dv-agt-pl   .  %xgra: 1|4|coord 2|1|obj 3|1|jct 4|0|root 5|6|quant 6|4|coord 7|6|jct  8|4|punct
  1; pat=xxx;wild=0;origmac->neg=1;txt=tape it up and two tape players .       %mor: v|tape pro|it adv:loc|up coord|and det:num|two n|tape n|play&dv-agt-pl   .  %xgra: 1|4|coord 2|1|obj 3|1|jct 4|0|root 5|6|quant 6|4|coord 7|6|jct  8|4|punct
  1; pat=xxx;wild=0;origmac->neg=1;txt=tape it up and two tape players .       %mor: v|tape pro|it adv:loc|up coord|and det:num|two n|tape n|play&dv-agt-pl   .  %xgra: 1|4|coord 2|1|obj 3|1|jct 4|0|root 5|6|quant 6|4|coord 7|6|jct  8|4|punct
  ... and so on until killing the process.

It appears that at some point in the file it stops moving across words
boundaries/consuming input tokens and gets stuck. Note that "tape it up and two tape players" i s not
the first utterance in the file.

searches like +s'!xxx^yyy'  and +s'xxx^!yyy' run to completion.

Anyway, I'm not sure if this is a bug or maybe an abuse of deprecated syntax or
something, but any advice would be appreciated.

Leonid Spektor

unread,
Aug 6, 2013, 5:02:36 PM8/6/13
to chib...@googlegroups.com
Paul,

COMBO was re-writen over a year ago to find all possible matches for search pattern that have multiple OR elements, represented with "+" symbol. It also has improved mixed negative and positive complex search patterns function. However, it was never tested for all negative searches like  +s'!*:wh|*^!*?' or "+s!xxx", because COMBO is searching for any match of search pattern anywhere within an utterance and the two negative search patterns above will technically match virtually all utterances. For example, the "+s!xxx" will match utterance:

*CHI: xxx .

because the pattern "!xxx" does not match utterance delimiter ".", which means match was a success. The only way the the pattern "!xxx" will not match utterance is if it only has "xxx" and nothing else, like this:

*CHI: xxx


You've shown that this is not what people expect. I will try to change COMBO to be less literal, but it will take some time. In the mean time please use KWAL as you have noted that you can as an alternative.


Leonid.



--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/cbdacf62-dcd5-4286-982d-c7b8ee263bcd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

paul

unread,
Aug 7, 2013, 1:11:15 PM8/7/13
to chib...@googlegroups.com
On Tuesday, August 6, 2013, Spektor, Leonid: CMU wrote:
    ...

>> the two negative search patterns above will technically match
>> virtually all utterances.


Thanks. I realize negative matching can be hard to reason about but I
just want to make sure it's clear that the behavior I'm observing is
combo running forever with no output beyond start-up. It needs to be
stopped manually by sending a keyboard interrupt/killing the
process/force-quitting. After some inspection with a debugger I
discovered that it gets stuck endlessly checking for matches on a
single utterance, and I posted output from combo's built-in debugging
statements in my original post demonstrating this.

When I said I realized I could use kwal instead of combo, I meant for
the simpler '!xxx' search. I can use

  kwal ... -s'xxx' ...

to exclude 'xxx'. However, it's unclear to me what to do about the

   +s'!*:wh|*^!*?'

   (exclude utterances that contain a wh-tagged word and end with a
   question mark)

search since combo doesn't seem to support -s for negative matching
(-s is accepted but appears to behave identically to +s), and kwal
doesn't accept pattern-matching syntax (right?).

Any advice?

paul

Leonid Spektor

unread,
Aug 7, 2013, 1:51:33 PM8/7/13
to chib...@googlegroups.com
Paul,

I understand that COMBO gets into infinite loop when all negative search patters are specified. This bug was not detected before, because we did not test the all negative search patterns, since in COMBO's implementation they would have matched almost all utterances and that seemed redundant. I understand now that it was an oversight to underestimate the way COMBO is used. The infinite loop bug was easy to fix and it has been fixed, but it does not help with all negative searches. To make all negative searches work the way people expect will take a few days of work. Unfortunately, there is nothing you can do until then. I will post on chibolts again when COMBO is fixed, which should be by the end of today or tomorrow.


Leonid.



--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.

paul

unread,
Aug 7, 2013, 3:40:59 PM8/7/13
to chib...@googlegroups.com
Great. Thanks!

paul

Leonid Spektor

unread,
Aug 7, 2013, 4:30:11 PM8/7/13
to chibolts
Paul,

I have changed COMBO to handle all negative searches better. New CLAN is on childes web site.

Leonid.

Paul Feitzinger

unread,
Aug 12, 2013, 6:27:42 PM8/12/13
to chib...@googlegroups.com
On 13/08/07, Leonid Spektor wrote:
> I have changed COMBO to handle all negative searches better. New
> CLAN is on childes web site.

Thanks for fixing that so quickly!

It seems to work properly now. I don't know if combo ever had a -s
flag like kwal does for excluding matching patterns, but perhaps
something like that would be better for this particular use case.

paul

Leonid Spektor

unread,
Aug 23, 2013, 8:13:26 PM8/23/13
to chib...@googlegroups.com
Paul,

I have added -s option to COMBO. Combo never had it before, but it is a good idea. COMBO is the only CLAN command that had +s option, but not -s. I hope this will help.

Leonid.
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To post to this group, send email to chib...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/20130812222742.GB4851%40gizmo?hl=en-US.
Reply all
Reply to author
Forward
0 new messages