opencog with crawled data

266 views
Skip to first unread message

vishnup...@gmail.com

unread,
Aug 23, 2016, 10:59:35 AM8/23/16
to opencog
Hi all,

I am new to this Opencog. If my questions sound stupid, please forgive me.
I thought of using crawled text data as a knowledge source for opencog. I guess the following steps can achieve that(?!).
  1. Running "batch-process.sh" which produces output as cff. 
  2. Running  "src/perl/cff-to-opencog.pl"  perl script. It coverts cff into hypergraphs, which is understandable by opencog. But how can i incorporate these hypergraphs into opencog.? Is there any way?
  3. Are these generated hypergraphs in the form of .scm file.?  (If so, then i  guess, i can load these files into Postgres. From database i can load atoms into atomspace(?!)).
  4. I also read about "relex web crawler".  It automatically add the relex output to HGDB, where it automatically becomes a new atom. Is the crawler usable now?

Thanks in advance!
Message has been deleted

Linas Vepstas

unread,
Aug 24, 2016, 1:24:05 AM8/24/16
to opencog
On Tue, Aug 23, 2016 at 9:59 AM, <vishnup...@gmail.com> wrote:
Hi all,

I am new to this Opencog. If my questions sound stupid, please forgive me.
I thought of using crawled text data as a knowledge source for opencog.

Sure
 
I guess the following steps can achieve that(?!).
  1. Running "batch-process.sh" which produces output as cff. 
Yes.  Not a strictly neccessary step, but handy, to avoid having to re-parse your data over and over.
  1. Running  "src/perl/cff-to-opencog.pl"  perl script. It coverts cff into hypergraphs, which is understandable by opencog.
Yes. It may be bit-rotted, as it is not used regularly.  I think ManHin played with this, while ingesting simple-english wikipedia.
 
  1. But how can i incorporate these hypergraphs into opencog.? Is there any way?
Just send them into the cogserver. One way to do this is to netcat them to port 17001.  There are other ways, to, e.g. directly reading them, as files, at the guile prompt. 
  1. Are these generated hypergraphs in the form of .scm file.?
Yes. 
  1.  (If so, then i  guess, i can load these files into Postgres.
No. postgres is SQL, but scm is scheme. 
  1. From database i can load atoms into atomspace(?!)).
yes, but you'd need to set that up first. That is a bit harder, and is a step you can skip at this point.
  1. I also read about "relex web crawler".
It does not exist. 
  1.  It automatically add the relex output to HGDB,
That' never worked. 
  1. where it automatically becomes a new atom. Is the crawler usable now?
Nope. It was a bad idea then, its still a bad idea now. 

Thanks in advance!

The part you are missing is this:

It is not hard to get data into opencog, but once you have done so, what will you do with it?  

You should maybe answer this question for yourself, as you go along.

--linas 

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/0745df89-51bf-423b-bf45-6928a4a415dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vishnup...@gmail.com

unread,
Aug 24, 2016, 5:46:25 AM8/24/16
to opencog
Thanks Linas for the Reply! 

Once i have the data in opencog, I have the following ideas and i think, it is doable(?!):

  1.  Probably play with Pattern Mining (Using Moses, or doing Frequent Pattern Mining) which i read from "Engineering General Intelligence".
  2.  Using this crawled data as a knowledge source for chatbot demonstration. But I also think that, WSD, Reasoning, and other some components are not yet integrated. Still the chatbot learns something from the data? Do you think, is it worth trying ?
I saw some discussions for embedding deep learning architecture (TensorFlow ?) into Opencog.  Is this idea still in Progress?
 

Thanks and regards,
Vishnu

Linas Vepstas

unread,
Aug 25, 2016, 1:47:08 AM8/25/16
to opencog
Hi Vishnu,

Not sure how to reply to your message.   We have all sorts of half-finished projects, and I'm not sure which one to steer your towards.  Improving chat in all sorts of different ways is a big priority.

--linas

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

vishnup...@gmail.com

unread,
Aug 26, 2016, 5:38:38 AM8/26/16
to opencog
Yeah linas I undersatnd!! Many Thanks for your reply.

Regards,
Vishnu



vishnup...@gmail.com

unread,
Aug 26, 2016, 9:36:23 AM8/26/16
to opencog

Thanks Linas for your reply. Actaully, it guides me well !! :-)  
One more question! 
 My data has timestamps and latitude, longitude info along with text.  Is it still possible to get the data into opencog?  Can I still be able to convert that into .scm  file?  
 Is Pattern Miner tailored to handle such data?

Regards,
    VishnuPriya

Linas Vepstas

unread,
Aug 26, 2016, 7:26:07 PM8/26/16
to opencog
On Fri, Aug 26, 2016 at 8:36 AM, <vishnup...@gmail.com> wrote:

Thanks Linas for your reply. Actaully, it guides me well !! :-)  
One more question! 
 My data has timestamps and latitude, longitude info along with text.  Is it still possible to get the data into opencog?  

Sure. You would have to write some new code to do this.  Every sentence is tagged with a SentenceNode to identify it. You would have to write some code to generate the following:

(EvaluationLink
    (PredicateNode "timestamp, lat and long")
    (ListLink
        (TimeNode "Fri Aug 26 18:18:56 CDT 2016")
        (ConceptNode "51.9244° N")
        (ConceptNode "4.4777° E") 
    ))

 
Can I still be able to convert that into .scm  file?  

as above.
 
 Is Pattern Miner tailored to handle such data?

? This is a non-sequiter question.   The pattern miner looks for patterns, in a very generic way.  Don't know what it might spot.   The above is just some arbitrary pattern.  Maybe the pattern miner will discover that a lot of your sentences that have certain words in them always come from the same latitude. Who knows.

--linas 


Regards,
    VishnuPriya

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

Linas Vepstas

unread,
Aug 26, 2016, 7:29:42 PM8/26/16
to opencog
Oops I forgot to add the sentence node:

(EvaluationLink
    (PredicateNode "sentencenode, timestamp, lat and long")
    (ListLink
        (SentenceNode "abc123-456-def")
        (TimeNode "Fri Aug 26 18:18:56 CDT 2016")
        (ConceptNode "51.9244° N")
        (ConceptNode "4.4777° E") 
    ))

But you could also do this:

(EvaluationLink
    (PredicateNode "sentence and location")
    (ListLink
         (SentenceNode "abc123-456-def")
         (EvaluationLink
             (PredicateNode "timestamp, lat and long")
             (ListLink
                 (TimeNode "Fri Aug 26 18:18:56 CDT 2016")
                (ConceptNode "51.9244° N")
               (ConceptNode "4.4777° E") 
    )))

which allows you to use the same location in multiple sentences.  You can invent other formats too.

--linas

vishnup...@gmail.com

unread,
Aug 28, 2016, 1:47:33 PM8/28/16
to opencog

Many thanks Linas :-) 

Regards,
Vishnu

vishnup...@gmail.com

unread,
Sep 6, 2016, 10:05:43 AM9/6/16
to opencog, linasv...@gmail.com
Hello all,

I have attached a small example Json file, which is generated from twitter stream. I will be getting lots of Json chunks like this. How can i give this to pattern miner. i.e. can i convert it to hypergraph? What are the steps involved?. what would be the best way to start with.
Any guidelines would be very much helpful. 

Thanks in advance 
twitter_json.png

vishnup...@gmail.com

unread,
Sep 7, 2016, 5:57:39 AM9/7/16
to opencog, linasv...@gmail.com

I think, i should do the following (?!)

write probably a python script that produces the following output for every json file:

(EvaluationLink
    (PredicateNode "sentence, location and body")
    (ListLink
         (SentenceNode "an unique string ")
         (EvaluationLink
             (PredicateNode " coordinates, country, continent, body")
             (ListLink
                (ConceptNode "-86.3222")
                (ConceptNode  "32.3934") 
                (ConceptNode  "US" )
                (ConceptNode  "northamerica")
                 (ConceptNode  ".....we need a new channel trump
                   tv!!.....) 
                  .
                  .
                  . )))


Then i can give this to pattern miner. 

Am i missing anything here?


Thanks in advance.

Linas Vepstas

unread,
Sep 9, 2016, 10:37:09 PM9/9/16
to vishnupriya kumar, opencog
Hi,

On Wed, Sep 7, 2016 at 4:57 AM, <vishnup...@gmail.com> wrote:

I think, i should do the following (?!)

write probably a python script that produces the following output for every json file:

(EvaluationLink
    (PredicateNode "sentence, location and body")
    (ListLink
         (SentenceNode "an unique string ")
         (EvaluationLink
             (PredicateNode " coordinates, country, continent, body")
             (ListLink
                (ConceptNode "-86.3222")
                (ConceptNode  "32.3934") 
                (ConceptNode  "US" )
                (ConceptNode  "northamerica")
                 (ConceptNode  ".....we need a new channel trump
                   tv!!.....) 
                  .
                  .
                  . )))


Then i can give this to pattern miner. 

Am i missing anything here?

Well, the pattern miner won't perform any parsing of the sentences for you, so the most likely thing it will do is find that there's lots of things with  (ConceptNode  "US" ) in them, and that this is highly correlated with  (ConceptNode  "northamerica")  After that, it might find patterns in the lat/log.  It does NOT do any string compares of the names of any nodes.

Unless you put at least WordNodes in there, you will get no text analysis.

--linas

Ben Goertzel

unread,
Sep 10, 2016, 4:55:54 AM9/10/16
to opencog, vishnupriya kumar
You probably want to run a bunch of sentences through the full NLP
pipeline, including Relex and R2L, and then do pattern mining on the
set of logical-semantic patterns that result...
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA35xEmMUgKyYVPqOufs-0LucDNCuNsoTmpGWgsrg29CcNg%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

Super-benevolent super-intelligence is the thought the Global Brain is
currently struggling to form...

vishnup...@gmail.com

unread,
Sep 12, 2016, 6:50:37 PM9/12/16
to opencog, vishnup...@gmail.com, linasv...@gmail.com
Hi Linas,


(EvaluationLink
    (PredicateNode "sentence, location and body")
    (ListLink  
         (EvaluationLink
             (PredicateNode "coordinates, country, continent, body")
             (ListLink
                (WordNode "-86.3222")
                (WordNode  "32.3934") 
                (WordNode  "US" )
                (WordNode  "northamerica")
                 (WordNode  ".....we need a new channel trump tv!!.....") 

            ))))

  1. Do you mean something like this? How the input should be?   (i tried giving it, but it says unbound variable "WordNode")
  2. If i have say for ex,  
    1. (....(WordNode "US") (WordNode  "trump is a candidate") ....) 
    2. (....(WordNode "US") (WordNode  "trump tv") ....) 
    3. (....(WordNode "US") (WordNode  " trump wins ") ....) 
    4. (....(WordNode "US") (WordNode  " trump president?") ....)
           will it find that sentences that have "trump" always comes from US?(.. (wordNode "US") (WordNode "Trump")...).  

Any guidelines would be helpful 

Thanks in advance,
Vishnu

Linas Vepstas

unread,
Sep 13, 2016, 2:28:44 AM9/13/16
to vishnupriya kumar, opencog
What Ben said -- you should run your data through the NLP pipeline.

vishnup...@gmail.com

unread,
Sep 13, 2016, 9:55:04 AM9/13/16
to opencog, vishnup...@gmail.com, linasv...@gmail.com

As said, I did run the data through relex and R2L as in http://wiki.opencog.org/w/Running_Relex2Logic_with_OpenCog 
I did the following:

I just took a simple sentence  "apple is fruit" 
---> (nlp-parse "apple is fruit")
---> (parse-get-r2l-outputs (ParseNode "sentence@2ac41081-45a2-44c6-aae4-a95451a9ae21_parse_0" (stv 1 0.991)))
I got the following as output:

((InheritanceLink
   (ConceptNode "apple@c26f23ad-af7e-4dba-8fc2-233835445923")
   (ConceptNode "apple" (stv 0,125 0,0012484394))
)
 (InheritanceLink
   (ConceptNode "fruit@5252f531-355d-4990-bb33-f2b8a5ed26a8")
   (ConceptNode "fruit" (stv 0,125 0,0012484394))
)
 (InheritanceLink
   (ConceptNode "apple@c26f23ad-af7e-4dba-8fc2-233835445923")
   (ConceptNode "fruit@5252f531-355d-4990-bb33-f2b8a5ed26a8")
)
 (ImplicationLink
   (PredicateNode "is@bce2ae2a-7b48-4152-b625-4003bf46245d")
   (PredicateNode "be" (stv 0,33333334 0,0012484394))
)
 (EvaluationLink
   (PredicateNode "is@bce2ae2a-7b48-4152-b625-4003bf46245d")
   (ListLink
      (ConceptNode "apple@c26f23ad-af7e-4dba-8fc2-233835445923")
      (ConceptNode "fruit@5252f531-355d-4990-bb33-f2b8a5ed26a8")
   )
)
 (EvaluationLink
   (PredicateNode "is@bce2ae2a-7b48-4152-b625-4003bf46245d")
   (ListLink
      (ConceptNode "apple@c26f23ad-af7e-4dba-8fc2-233835445923")
   )
)
 (InheritanceLink
   (InterpretationNode "sentence@89241aa8-f58a-4fd4-b7be-2bd661f2ed7b_parse_0_interpretation_$X")
   (DefinedLinguisticConceptNode "DeclarativeSpeechAct")
)
 (InheritanceLink
   (PredicateNode "is@bce2ae2a-7b48-4152-b625-4003bf46245d")
   (DefinedLinguisticConceptNode "present")
)
)

Then i parsed another sentence and got R2L results. After that, I put  r2l outputs of both sentences in a scm file (input.scm)  and gave it to pattern miner. But it threw ERROR (segmentation_fault.png). 

how can i give bunch of sentences and get R2L outputs, which in turn i can give to pattern miner? 


I also thought a way to do this:  
--->  converting bunch of lines into cff  by using "batch-process.sh"  and in turn converting  that into scm ./cff-to-opencog.pl . But it will be in the form of relex output.
so picking some WordInstanceNode of each sentence from the relex output and doing the below to get R2L outputs. 

(cog-incoming-set (car (cog-incoming-set (ConceptNode (cog-name (WordInstanceNode "apple@2d15518b-c626-4ce3-8e6d-ecd07d3f9e46"))))))
But it would be tedious!!


in general, how can i handle this, i.e. giving bunch of sentences and getting r2l outputs?












segmentation_fault.png
input.scm

Linas Vepstas

unread,
Sep 13, 2016, 4:43:50 PM9/13/16
to vishnupriya kumar, opencog
Ah! Now we're getting somewhere!

On Tue, Sep 13, 2016 at 8:55 AM, <vishnup...@gmail.com> wrote:

I just took a simple sentence  "apple is fruit" 
---> (nlp-parse "apple is fruit")
---> (parse-get-r2l-outputs (ParseNode "sentence@2ac41081-45a2-44c6-aae4-a95451a9ae21_parse_0" (stv 1 0.991)))
I got the following as output:

Looks good to me. 


Then i parsed another sentence and got R2L results. After that, I put  r2l outputs of both sentences in a scm file (input.scm)  and gave it to pattern miner. But it threw ERROR (segmentation_fault.png). 

:-(
OK, so .. here's the deal: 

-- Clearly, the segfault is bad, and needs to be fixed!

-- there are two versions of the pattern miner, the one here, and the one in a different (older) branch of opencog.  Shujing Ke did most of her work in the older branch, and no one has ported her changes to the current code.  This should also probably be done.  The older branch is here:  https://github.com/opencog/opencog/branches  PatternMinerEmbodiment -- you can see that she has made 65 updates, but that her code is 4639 commits behind master!  It might be the case that her code will nt segfault, no one knows.

-- its not entirely obvious to Nil or to me that the Pattern Miner is correctly written, anyway.  We need to review it.  There is a very highly specialized version of a pattern miner on the language-learning code, and I was planning on perhaps replacing that by a general-purpose miner, but have not gotten around to it. Its a big project.
 
TL;DR: We need someone to roll up their sleeves, and take control of the pattern Miner, and fix it, advance it, improve it, etc.

how can i give bunch of sentences and get R2L outputs, which in turn i can give to pattern miner? 

Well, that is the magic question, isn't it?  I'm not sure what state the pattern-miner demos and examples are in. A good place to start would be to review those, and then write a new one, explicitly dealing with language issues. 


I also thought a way to do this:  
--->  converting bunch of lines into cff  by using "batch-process.sh"  and in turn converting  that into scm ./cff-to-opencog.pl .

cff is useful only for saving some CPU time during bulk processing.  Right now, the system is not ready for bulk processing, so saving some CPU cycles is not worth the effort.
 
But it will be in the form of relex output.
so picking some WordInstanceNode of each sentence from the relex output and doing the below to get R2L outputs. 

(cog-incoming-set (car (cog-incoming-set (ConceptNode (cog-name (WordInstanceNode "apple@2d15518b-c626-4ce3-8e6d-ecd07d3f9e46"))))))
But it would be tedious!!

why is that tedious?  That's more or less how you're supposed to do it: its a giant graph, you have to chase the edges of the graph to get what you want.  Your code is not the most elegant way to chase through an edge, but its not atypical. There are various InheritanceLinks, etc. in place to simplify such searches.  There are also various utilities and macros for some of this stuff (in the utilities.scm and nlp-utilities.scm files)

--linas

vishnup...@gmail.com

unread,
Sep 15, 2016, 11:12:58 AM9/15/16
to opencog, vishnup...@gmail.com, linasv...@gmail.com

So should i post this segmentation fault in github? 

--Thanks
Vishnu

Linas Vepstas

unread,
Sep 15, 2016, 12:45:56 PM9/15/16
to vishnupriya kumar, opencog
On Thu, Sep 15, 2016 at 10:12 AM, <vishnup...@gmail.com> wrote:

So should i post this segmentation fault in github? 

Sure. It would be better if you fixed it!
Reply all
Reply to author
Forward
0 new messages