Re: [opencog/relex] Problem with apostrophes for language ANY (#248)

23 views
Skip to first unread message

Ben Goertzel

unread,
Jan 6, 2017, 12:01:36 PM1/6/17
to opencog/relex, opencog, opencog/relex, Subscribed
Very cool Jacek!

Of course there is no strong reason to do this sort of textual MI calculation in Atomspace, when it can be done just as easily with standalone scripts

However, once we get to later phases of the algorithm where we are doing subtree MI calculations on parse trees (rather than on word sequences), it would be really nice to use Atomspace (rather than making some separate, standalone tree data structure) ....  Though of course, in the end "practicably doable" trumps "really nice" ...

Senna and I discussed previously the idea of making a kind of "frozen Atomspace" , to be used when one wants to repeatedly query or analyze a certain Atomspace, during a time interval in which one does not need to change that Atomspace....  So it would be an Atomspace that was not changeable, but was very rapidly traversable....   I wonder if this is a good solution to this sort of problem.   I.e. once one has done the MST parsing, one gets a bunch of parse trees, and then puts them in an Atomspace, and freezes that Atomspace, and then does a bunch of iterated calculations on that Atomspace to calculate subtree MI values...

Just speculating a bit... ;)





On Sat, Jan 7, 2017 at 12:51 AM, Jacek Świergocki <notifi...@github.com> wrote:

Hi Linas,
Yes, in September I tried to run language-learning experiment according to your instructions and Rohit Shinde Q&A on the newsgroup. I ran a pipeline from opencog repo:

split-sentences.pl -> link-grammar -> relex -> scheme -> atomspace -> postgress

Basically it worked, but very very slowly. I estimated it would have taken months to get reasonable size disjuncts dataset for clustering in the next step. So I had written some simple but efficient c++ programs that do the same besides atomspace and ran a pipeline:

split-sentences.pl -> link-grammar -> text-files -> c++ programs -> text-files

It took just few days on 3-core machine to get mutual informations for word pairs and disjunct sets for sentences after MST parsing:
(dataset size: ~24M sentences, ~750K words, ~26M word-pairs, language: English)
And then I suspended this experiment because of lack of time.

As fair as I understand next steps of this experiment you have described here:
https://docs.google.com/viewer?a=v&pid=forums&srcid=MDQ3MzU0NzU5MTM4MjQ0MDEwOTgBMDgxMTg0NDQyODM5MjI4MDIwOTUBa3R4d2pORmdlRTBKATAuMQEBdjI

I think I can make some programming contribution to this project, probably I will have some time after January 20 or later. If you see something specific to do, please let me know, I am not aware of the current status of this project. Of course I can help verify experiments with Polish language as a native speaker.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.




--
Ben Goertzel, PhD
http://goertzel.org

“I tell my students, when you go to these meetings, see what direction everyone is headed, so you can go in the opposite direction. Don’t polish the brass on the bandwagon.” – V. S. Ramachandran
Reply all
Reply to author
Forward
0 new messages