Note also that you can use the 4.0.regex file in order to define entities (among other things).
Here is an example, as a basic explanation of how to use this idea (you may need to refine it, and this may be a non-trivial work).
For the purpose of this example, suppose you want any word with a capital letter, that is not recognized otherwise, to be considered as an entity.
Furthermore, suppose you also want all words with an underscore in them to be considered as entities too.
To that end, define this in 4.0.regex the definition of <UNKNOWN-WORD>:
<ENTITY>: /[[:upper:]][^'’]*[^[:punct:]]$/ % Unknown words that contain an uppercase letter
<ENTITY>: /_*[^[:punct:]]$/ % Unknown words that contain an underscore
% ...
<UNKNOWN-WORD>: /^[.,-]{4}[.,-]*$/
In the dictionary file add anywhere in the entity word list:
<ENTITY>
/en/words/entities.gods:
<marker-entity> or <given-names> or <directive-opener> or <directive-subject>;
(The [:punct:] stuff is for excluding word-ending punctuation so the regex would not swallow it.)
If you want that your regex entities would appear with the subscript .b, add <ENTITY>.b to the dictionary instead of just <ENTITY> (in the 4.0.regex file it should remain unsubscripted!).
You will get (I used !morphology to display the regex label):
linkparser> !morphology
Display word morphology turned on.
linkparser> This is an iPhone.
Found 8 linkages (8 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 2.00 LEN=6)
+------------------Xp------------------+
+----->WV----->+-----Oste----+ |
+-->Wd---+-Ss*b+ +--Ds**v--+ |
| | | | | |
LEFT-WALL this.p is.v an iPhone[!<ENTITY>] .
Press RETURN for the next linkage.
linkparser> The variable test_it is not initialized.
Found 26 linkages (26 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 2.05 LEN=13)
+--------------------------------Xp--------------------------------+
+------------------->WV------------------>+ |
+------------->Wd-------------+ | |
| +-----------D----------+ +-------Pa------+ |
| | +-------A------+----Ss*s---+-EBm+----EA----+ |
| | | | | | | |
LEFT-WALL the variable.a test_it[!<ENTITY>] is.v not.e initialized.v-d .
For defining really complex pattern matches, you would need to install the PCRE2 library (libpcre2-dev) and rebuild link-grammar.
To use it, see the manual of pcrepattern.
To check if the link-grammar library already uses it, do from the link-grammar source directory:
ldd link-grammar/.libs/liblink-grammar.so
and check for
libpcre2-8.
--
Amir