Hi Linas,
I would expect that those links are not returned because "Mom" has
nothing to do with "telescope" in the first sentence
and "chalk" has
nothing to do with "on" in the second sentence, to my understanding.
We have had discussion on this involving Andres, so we had to create
manually edited "gold standard" parses (which do not have "extra links"):
http://88.99.210.144/data/andres_parses/poc-english_ex-parses-gold.txt
compared to LG "silver standard" parses (with those extra links):
http://langlearn.singularitynet.io/data/parses/English/POC-English/poc_english-LG-silver.txt.ull
I recall we had discussion the future version of LG will have this fixed
and I hoped we can get the "gold" (manual) and "sliver" (LG) standards
merged. Sorry if I misunderstood on that.
On the new LG version an performance on long sentences - cool, we would
love to have it, because we have major performance problems with LG
parsing Gutenberg Children - you can get the dictionary here:
http://langlearn.singularitynet.io/data/clustering_2018/Gutenberg-Children-Books-1000-disjuncts-2018-10-29/Gutenberg-Children-Books-Caps-50-clusters-1000-disjuncts-2018-10-29_/Gutenberg-Children-Books-Caps_LG-English_cALEd_no-LW_no-RW_no-gen/
and try to parse the corpus that was used to create this dictionary:
http://langlearn.singularitynet.io/data/cleaned/English/Gutenberg-Children-Books/capital/
Right now, it takes technical infinity to do the parse for the above, we
have never got the parse results results with current version of LG.
Cheers,
-Anton
15.11.2018 11:21, Linas Vepstas:
>
> On Wed, Nov 14, 2018 at 3:25 AM Anton Kolonin @ Aigents
> <akol...@aigents.com <mailto:akol...@aigents.com>> wrote:
>
> Hi Amir and Linas,
>
> We have finally upgraded to LG 5.5.1 and see that some sentences in our
> reference corpus are not parsed right:
> http://langlearn.singularitynet.io/data/poc-english/poc_english.txt
>
>
> For one example:
>
> link-parser
> link-grammar: Info: Dictionary found at
> /home/akolonin/miniconda3/envs/ull-lg55/share/link-grammar/en/4.0.dict
> link-grammar: Info: Dictionary version 5.5.1, locale en_US.UTF-8
> link-grammar: Info: Library version link-grammar-5.5.1. Enter "!help"
> for help.
> linkparser> Dad saw Mom with a telescope.
> Found 18 linkages (18 had no P.P. violations)
> Linkage 1, cost vector = (UNUSED=0 DIS=-0.61 LEN=9)
>
> +---------------------Xp---------------------+
> +----->WV----->+-----MVp----+----Js---+ |
> +-->Wd--+-Ss*s-+--Os--+--Mp-+ +-Ds**c+ |
> | | | | | | | |
> LEFT-WALL dad.m saw.v-d Mom.l with a telescope.n .
>
> Press RETURN for the next linkage.
>
>
> linkparser> Mom writes with chalk on the board.
> Found 32 linkages (32 had no P.P. violations)
> Linkage 1, cost vector = (UNUSED=0 DIS=-0.61 LEN=11)
>
> +-------------------------Xp-------------------------+
> +------>WV----->+---------MVp--------+----Ju---+ |
> +-->Wd--+--Ss*s-+--MVp-+--Ju--+--Mp--+ +--Dmu-+ |
> | | | | | | | | |
> LEFT-WALL Mom.l writes.v with chalk.n-u on the board.n-u .
>
> Press RETURN for the next linkage.
> linkparser>
>
>
>
> In the sentences above, links
>
> Mom.l--Mp-with
>
> and
>
> chalk.n-u--Mp--on
>
> Seems unexpected.
>
>
> Unexpected, perhaps, but correct, a far as I can tell, and documented in
> great detail:
>
> https://www.abisource.com/projects/link-grammar/dict/section-M.html
>
> As I recall from some earlier discussions, issues like those should be
> not exiting in the latest LG version.
>
> If we misunderstand something or we should rather create an issues on
> this matter?
>
>
> I don't see the issue. It appears to be 100% correct. What were you
> expecting to happen, instead?
>
> --linas
>
> p.s. it would be more convenient if you used the link-grammar mailing list.
>
> --
> cassette tapes - analog TV - film cameras - you
>
> --
> You received this message because you are subscribed to the Google
> Groups "lang-learn" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to lang-learn+...@googlegroups.com
> <mailto:lang-learn+...@googlegroups.com>.
> To post to this group, send email to lang-...@googlegroups.com
> <mailto:lang-...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lang-learn/CAHrUA36%2BQMxMjwRz493siJVtzjBGDOHF2p3PsYGXkrHiSa_cMA%40mail.gmail.com
> <https://groups.google.com/d/msgid/lang-learn/CAHrUA36%2BQMxMjwRz493siJVtzjBGDOHF2p3PsYGXkrHiSa_cMA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.
--
-Anton Kolonin
skype: akolonin
cell: +79139250058
akol...@aigents.com
https://aigents.com
https://www.youtube.com/aigents
https://www.facebook.com/aigents
https://plus.google.com/+Aigents
https://medium.com/@aigents
https://steemit.com/@aigents
https://golos.blog/@aigents
https://vk.com/aigents
Hi Linas, the site is up and running - attaching the screenshots.
I see your reasoning, actually.
But since you have suggested to refer to Stanford Parser, I just have checked that it does the other way.
+----B----+
+-R-+-RS--+ | | | The dog who chased me was black
Linas, thank you!
The dog who chased me was black

For example, this parse makes sense, and seems right:
+-------->WV------->+
+---->Wd-----+ |
| +Ds**c+-Ss*s-+---Pa--+
| | | | |
LEFT-WALL the dog.n was.v-d black.abut there is another possibility, that kind-of makes sense (and perhaps language learning will find):
+---->Wd---->+
| +-->adjcomp--->+
| +Ds**c+ +<-cop<-+
| | | | |
LEFT-WALL the dog.n was blackHere, adjcomp is "adjectival compliment" and "cop" was copula. Some dependency grammars draw this graph. Some call it "predicative adjectival modifier". Lets quibble. Note that I did not draw an arrow from subject to verb. I could, I suppose. Note that it is now IMPOSSIBLE to draw an arrow from root/left-wall to the verb, because it would require a
link-crossing, it would have to cross over the adjcomp arrow.Thus, if you want to draw an arrow from root to head-verb, and also get a planar graph, you are not allowed to draw the adjcomp/predadj arrow. That helps explain what LG does.It also helps make clear that the no-links-crossing constraint is imperfect. It seems reasonable, but clearly, there is a violation in the above rathertrivial sentence!
Hello Linas. If you leave it to the learning mechanism, aren't you inevitably going to get crossed links? To take an even simpler example, "It was raining", your learning mechanism should work out three predictions:
When you put these expectations together, you find a dependency triangle, with subject links from both verbs to "it" and dependency from "was" to "raining". Since both of the "it" links are the same ('subject'),
there's no reason for assigning them to different levels of structure (deep vs surface), so you get a topological tangle.
Dick
--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To post to this group, send email to link-g...@googlegroups.com.
Visit this group at https://groups.google.com/group/link-grammar.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA36aRbObkgMmOGvxO2eGr0RV6pcwrkVBUR-yua_LOYNFSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
-- Richard Hudson (dickhudson.com)
Wow! Thanks for the teach-in, Linas. Very interesting, and you make it all as clear as it could be, I guess.
I think of your projects as an experiment whose goal is to find how far it's possible to get in learning a language using nothing but written records - a bit like deciphering a dead language simply by spotting patterns in the available corpus of texts. To some extent, your success will reflect the quality and 'depth' of the raw data, which (in the case of English texts) already reflect quite a sophisticated linguistic analysis thanks to the word spaces, the punctuation and the spelling (which distinguishes some homophones). I'm not sure what it will tell us about human language, but it will presumably tell us a lot about the limits of AI. Would you agree?
Anyway, I'm very impressed by what you and your colleagues in this field have achieved already.
Best wishes, Dick
-- Richard Hudson (dickhudson.com)
Hi Amir,
We have found that the same corpus may be parsed with the same LG
version 5.5.1 either in 19/53 minutes or stay hanging "forever",
depending on nature of machine generated dict file (obtained with our
unsupervised learning pipeline).
We have identified that hanging "forever" is happening due to
"combinatorial explosions" that you have explained in earlier issue:
https://github.com/opencog/link-grammar/issues/798
Now, we are implementing the workaround that you have suggested to skip
the sentences causing the "combinatorial explosions", however we would
like to ask you if it is really the case.
For example, using the dictionary
http://langlearn.singularitynet.io/data/aglushchenko_parses/test-lg-5.1.1/dict.tar.gz
while parsing the file
http://langlearn.singularitynet.io/data/aglushchenko_parses/test-lg-5.1.1/corpus.tar.gz
there is "combinatorial explosion" on the sentence:
"Now not far from the music master's house there dwelt a lady who
possessed a most lovely little pussy cat called Koma."
There are much more combinatorial explosions in the same file.
Do you think this is expected?
If so, do you think the best solution is to skip the sentence of
introduce costs in the LG dictionary?
Note, using other machine generated dictionaries, the whole following
batch of files including the file mentioned above is processed decently
fast (19 or 53 minutes, see details below):
http://langlearn.singularitynet.io/data/cleaned/English/Gutenberg-Children-Books/capital/
Thanks,
-Anton
23.11.2018 11:41, Anton Kolonin:
>
>
> 22.11.2018 20:50, Alexey Glushchenko:
>> There are two more in Gutenberg-Children-Books-500-disjuncts-2018-10-31:
>>
>> Gutenberg-Children-Books-Caps-20-clusters-2018-10-31
>> Gutenberg-Children-Books-Caps_LG-ANY-all-parses-agm-opt_cALEd_no-LW_no-RW_no-gen
>> - complete (11hrs 28 min 49 sec)
>
>
> Maximum disjunct length 16
> dict_20C_2018-10-31_0006.4.0.dict 2018-10-31 15:47 198K
> Average sentence parse: 59.63%
> Recall: 31.05%
> Precision: 37.99%
> F1: 34.17%
>
>
>> Gutenberg-Children-Books-Caps_LG-English_cALEd_no-LW_no-RW_no-gen -
>> complete (12 min 48 sec)
>
> Maximum disjunct length 10
> dict_20C_2018-10-31_0006.4.0.dict 2018-10-31 10:32 182K
> Average sentence parse: 55.29%
> Recall: 29.55%
> Precision: 34.44%
> F1: 31.81%
>
>
>> Gutenberg-Children-Books-Caps-10-clusters-1000-disjuncts-2018-10-29_
>>
>>
>> Gutenberg-Children-Books-Caps_LG-ANY-all-parses-agm-opt_cALEd_no-LW_no-RW_no-gen
>>
>> - in progress since Nov 20 02:10 (server time)
>>
>
> Maximum disjunct length 16
> dict_10C_2018-10-29_0006.4.0.dict 2018-10-29 18:51 229K
>
>
>> Gutenberg-Children-Books-Caps_LG-English_cALEd_no-LW_no-RW_no-gen -
>> in progress since Nov 20 02:10 (server time)
>>
>
> Maximum disjunct length 10
> dict_10C_2018-10-29_0006.4.0.dict 2018-10-29 16:15 231K
>
>
>> Gutenberg-Children-Books-Caps-20-clusters-1000-disjuncts-2018-10-29_
>>
>>
>> Gutenberg-Children-Books-Caps_LG-ANY-all-parses-agm-opt_cALEd_no-LW_no-RW_no-gen
>>
>> - in progress since Nov 20 02:10 (server time)
>>
>
> Maximum disjunct length 16
> dict_20C_2018-10-31_0006.4.0.dict 2018-10-31 12:00 265K
>
>
>> Gutenberg-Children-Books-Caps_LG-English_cALEd_no-LW_no-RW_no-gen -
>> complete (53 min 55 sec)
>>
>
> Maximum disjunct length 10
> dict_20C_2018-10-29_0006.4.0.dict 2018-10-29 18:21 283K
> Average sentence parse: 58.19%
> Recall: 32.07%
> Precision: 35.23%
> F1: 33.58%
>
>>
>> Gutenberg-Children-Books-Caps-50-clusters-1000-disjuncts-2018-10-29_
>> Gutenberg-Children-Books-Caps_LG-English_cALEd_no-LW_no-RW_no-gen
>> - complete (19 min 40 sec)
>>
>
> Maximum disjunct length 10
> dict_50C_2018-10-29_0006.4.0.dict 2018-10-29 18:29 517K
> Average sentence parse: 44.35%
> Recall: 21.89%
> Precision: 32.01%
> F1: 26.00%
--
You received this message because you are subscribed to the Google Groups "lang-learn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lang-learn+...@googlegroups.com.
To post to this group, send email to lang-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lang-learn/f07cfe6b-1459-c163-0788-ee4ed61fdb04%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
!use-sat does not help much. In one case it smashes the stack:
while in another case it can not parse with null links:
$ echo "Now not far from the music master's house there dwelt a lady who possessed a most lovely little pussy cat called Koma." | link-parser dict_10C_2018-10-29_0006/ -timeout=1 -postscript=1 -graphics=0 -verbosity=1 -use-sat=1
timeout set to 1
postscript set to 1
graphics set to 0
verbosity set to 1
use-sat set to 1
link-grammar: Info: Dictionary found at ./dict_10C_2018-10-29_0006/4.0.dict
link-grammar: Info: Dictionary version 0.0.6, locale en_US.UTF-8
link-grammar: Info: Library version link-grammar-5.5.1. Enter "!help" for help.
*** stack smashing detected ***: link-parser terminated
Aborted
$ echo "Hello world!" | link-parser dict_10C_2018-10-29_0006/ -timeout=1 -postscript=1 -graphics=0 -verbosity=1 -use-sat=1
timeout set to 1
postscript set to 1
graphics set to 0
verbosity set to 1
use-sat set to 1
link-grammar: Info: Dictionary found at ./dict_10C_2018-10-29_0006/4.0.dict
link-grammar: Info: Dictionary version 0.0.6, locale en_US.UTF-8
link-grammar: Info: Library version link-grammar-5.5.1. Enter "!help" for help.
No complete linkages found.
link-grammar: Info: use-sat: Cannot parse with null links (yet).
Set the "null" option to 0 to turn off parsing with null links.
link-grammar: Info: Freeing dictionary dict_10C_2018-10-29_0006/4.0.dict
link-grammar: Info: Freeing dictionary dict_10C_2018-10-29_0006/4.0.affix
Bye.
Понедельник, 26 ноября 2018, 16:26 +09:00 от Linas Vepstas <linasv...@gmail.com>:
To view this discussion on the web visit https://groups.google.com/d/msgid/lang-learn/CAHrUA34T5kTLb1%3DS1KMQ7-t-eet8QQWo4bU6vzzMB%3DPwg9Voeg%40mail.gmail.com.