"Maximal number of matches per subgraph reached"

59 views
Skip to first unread message

Daniel Stein

unread,
Feb 6, 2015, 4:06:09 AM2/6/15
to unitex-...@googlegroups.com
When I apply mygraph to a larger corpus, I receive the following error message:

Maximal number of matches per subgraph reached text text <<HERE>> text text text text <<END>> text text graph name: [mygraph:509:49]

I am not sure, what the number at the end of the line is referring to, so I am not sure how to debug this. Can you point me in the right direction?

Many thanks in advance
 

Denis Maurel

unread,
Feb 6, 2015, 6:50:50 AM2/6/15
to Daniel Stein, unitex-...@googlegroups.com


Dear Daniel,

The diagnostic is simple: your graph open too possibilities. Unitex tests the longuest match trying all possible way. If there are too much, it fails.

The solution not always simple: to search where the graph open open too possibilities and modify the graph or replace it by two graphs (if you use a cascade...).

Best regards,

Denis Maurel


____________________________________
Professor Denis Maurel
Université François Rabelais Tours
LI (Computer Science Research Laboratory)
EPU-DI
64 avenue Jean-Portalis
37200 Tours
France
Phone: 33-2.47.36.14.35
Fax: 33-2.47.36.14.22
mailto:denis....@univ-tours.fr

http://www.univ-tours.fr/maurel

http://www.li.univ-tours.fr
http://tln.li.univ-tours.fr/



--
You received this message because you are subscribed to the Google Groups "Unitex-GramLab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unitex-gramla...@googlegroups.com.
To post to this group, send email to unitex-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unitex-gramlab.
To view this discussion on the web visit https://groups.google.com/d/msgid/unitex-gramlab/9fb0d8c0-f8fd-4eda-aec9-38c964932e3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Stein

unread,
Feb 6, 2015, 6:58:16 AM2/6/15
to unitex-...@googlegroups.com, daniel...@gmail.com, denis....@univ-tours.fr
Yes, but do you know what the numer is refereing to? I have the impression it refers to the concrete part of the graph the produces the high number of results but I can't find what exactly is meant when I inspect the file (there is no passage with the numbers 509:49 in mygraph

Thanks again, Denis,

this time it is not the cascase ;-)

Daniel

Nebojsa Vasiljevic

unread,
Feb 6, 2015, 9:41:01 AM2/6/15
to Daniel Stein, unitex-...@googlegroups.com, denis....@univ-tours.fr
Daniel,

In my experience, the problem can occur when your graph try to match too long part of text, even if the part of text is just a potential beginning of something that could be matched. 

I try to follow the rule: A graph should always decide on acceptance (either positively of negatively) after relatively small number of tokens. Two typical cases when you break this rule are:

- your graph is designed to match a big part of text
- an unexpectedly long structure occurs in text (e.g. some enumeration or other kind of repetition)

In particular case, try to isolate the smallest piece of text that produce the problem, and then you will probably figure out what should you try to change in graphs.

The level of statically unresolvable non-determinism in your graph can be also significant. Sometimes flattering may help.

Regards,
Nebojša

Regards,
Nebojša

eric.laporte

unread,
Feb 6, 2015, 10:37:01 AM2/6/15
to unitex-...@googlegroups.com
Dear Daniel,
A frequent case when this occurs is a <TOKEN> loop. Since a text is made of tokens, such a loop may recognize any span of the text. A simple solution is to limit the graph so that it can recognize only within a sentence. To do that, make sure you have accepted the graph that inserts sentence delimiters {S} during preprocessing, and manage to exclude {S} from the loop, by replacing <TOKEN> with less general lexical masks.
Best,
Eric
Reply all
Reply to author
Forward
0 new messages