Hi Linas,
Yes, in September I tried to run language-learning experiment according to your instructions and Rohit Shinde Q&A on the newsgroup. I ran a pipeline from opencog repo:split-sentences.pl -> link-grammar -> relex -> scheme -> atomspace -> postgressBasically it worked, but very very slowly. I estimated it would have taken months to get reasonable size disjuncts dataset for clustering in the next step. So I had written some simple but efficient c++ programs that do the same besides atomspace and ran a pipeline:
split-sentences.pl -> link-grammar -> text-files -> c++ programs -> text-filesIt took just few days on 3-core machine to get mutual informations for word pairs and disjunct sets for sentences after MST parsing:
(dataset size: ~24M sentences, ~750K words, ~26M word-pairs, language: English)
And then I suspended this experiment because of lack of time.As fair as I understand next steps of this experiment you have described here:
https://docs.google.com/viewer?a=v&pid=forums&srcid=MDQ3MzU0NzU5MTM4MjQ0MDEwOTgBMDgxMTg0NDQyODM5MjI4MDIwOTUBa3R4d2pORmdlRTBKATAuMQEBdjII think I can make some programming contribution to this project, probably I will have some time after January 20 or later. If you see something specific to do, please let me know, I am not aware of the current status of this project. Of course I can help verify experiments with Polish language as a native speaker.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.