"Maximal number of matches per subgraph reached"

Daniel Stein

unread,

Feb 6, 2015, 4:06:09 AM2/6/15

to unitex-...@googlegroups.com

When I apply mygraph to a larger corpus, I receive the following error message:

Maximal number of matches per subgraph reached text text <<HERE>> text text text text <<END>> text text graph name: [mygraph:509:49]

I am not sure, what the number at the end of the line is referring to, so I am not sure how to debug this. Can you point me in the right direction?

Many thanks in advance

Denis Maurel

unread,

Feb 6, 2015, 6:50:50 AM2/6/15

to Daniel Stein, unitex-...@googlegroups.com

Dear Daniel,

The diagnostic is simple: your graph open too possibilities. Unitex tests the longuest match trying all possible way. If there are too much, it fails.

The solution not always simple: to search where the graph open open too possibilities and modify the graph or replace it by two graphs (if you use a cascade...).

Best regards,

Denis Maurel

____________________________________
Professor Denis Maurel
Université François Rabelais Tours
LI (Computer Science Research Laboratory)
EPU-DI
64 avenue Jean-Portalis
37200 Tours
France
Phone: 33-2.47.36.14.35
Fax: 33-2.47.36.14.22
mailto:denis....@univ-tours.fr

http://www.univ-tours.fr/maurel

http://www.li.univ-tours.fr
http://tln.li.univ-tours.fr/

--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unitex-gramlab.
To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/9fb0d8c0-f8fd-4eda-aec9-38c964932e3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Stein

unread,

Feb 6, 2015, 6:58:16 AM2/6/15

to unitex-...@googlegroups.com, daniel...@gmail.com, denis....@univ-tours.fr

Yes, but do you know what the numer is refereing to? I have the impression it refers to the concrete part of the graph the produces the high number of results but I can't find what exactly is meant when I inspect the file (there is no passage with the numbers 509:49 in mygraph

Thanks again, Denis,

this time it is not the cascase ;-)

Daniel

Nebojsa Vasiljevic

unread,

Feb 6, 2015, 9:41:01 AM2/6/15

to Daniel Stein, unitex-...@googlegroups.com, denis....@univ-tours.fr

Daniel,

In my experience, the problem can occur when your graph try to match too long part of text, even if the part of text is just a potential beginning of something that could be matched.

I try to follow the rule: A graph should always decide on acceptance (either positively of negatively) after relatively small number of tokens. Two typical cases when you break this rule are:

- your graph is designed to match a big part of text

- an unexpectedly long structure occurs in text (e.g. some enumeration or other kind of repetition)

In particular case, try to isolate the smallest piece of text that produce the problem, and then you will probably figure out what should you try to change in graphs.

The level of statically unresolvable non-determinism in your graph can be also significant. Sometimes flattering may help.

Regards,

Nebojša

Regards,

Nebojša

Nebojša Vasiljević

nebojsa.v...@gmail.com

http://linkedin.com/in/vasiljevic

To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/b1dc4206-c46e-478d-8b82-5dc28ca4046b%40googlegroups.com.

eric.laporte

unread,

Feb 6, 2015, 10:37:01 AM2/6/15

to unitex-...@googlegroups.com

Dear Daniel,
A frequent case when this occurs is a <TOKEN> loop. Since a text is made of tokens, such a loop may recognize any span of the text. A simple solution is to limit the graph so that it can recognize only within a sentence. To do that, make sure you have accepted the graph that inserts sentence delimiters {S} during preprocessing, and manage to exclude {S} from the loop, by replacing <TOKEN> with less general lexical masks.
Best,
Eric

Reply all

Reply to author

Forward