Pattern mining from PLN inference histories

Ben Goertzel

unread,

May 21, 2017, 11:17:58 AM5/21/17

to Nil Geisweiller, opencog, Shujing Ke

Nil,

I wrote down our two sketchy examples of patterns to be mined from PLN
inference patterns, from our F2F discussion last week, here:

http://wiki.opencog.org/w/Pattern_Miner_Prospective_Examples#patterns_in_PLN_inference_histories

It will be good if you can write these out in the fully explicit
Atomese format that PLN actually uses to save its inference
histories...

thx!
ben

--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

Nil Geisweiller

unread,

May 23, 2017, 10:06:32 AM5/23/17

to Ben Goertzel, opencog, Shujing Ke

Hi,

I've corrected the inferences (note that ExecutionLink are actually
ExecutionOutputLink because the "inference trails" are actually
inferences to be executed rather than records).

Also I've attached a file with ~500 inferences obtained from running the
BackwardChainerUTest, can generate many more if needed.

Nil

BackwardChainerUTest-inferences.scm.xz

Nil Geisweiller

unread,

May 23, 2017, 10:06:39 AM5/23/17

to Ben Goertzel, opencog, Shujing Ke

Hi,

I've corrected the inferences (note that ExecutionLink are actually
ExecutionOutputLink because the "inference trails" are actually
inferences to be executed rather than records).

Also I've attached a file with ~500 inferences obtained from running the
BackwardChainerUTest, can generate many more if needed.

Nil

On 05/21/2017 06:17 PM, Ben Goertzel wrote:

BackwardChainerUTest-inferences.scm.xz

Shujing Ke

unread,

May 31, 2017, 6:32:53 PM5/31/17

to Nil Geisweiller, Ben Goertzel, opencog

Hi, Nil and Ben,

I studied the corpus. Is each BindLink one instance of inference? So that each BindLink should be considered as primitve / atomic - one pattern should be one BindLink; any Links inside a BindLink should not be mined separatly, right? For example,

      (InheritanceLink
        (VariableNode "$X")
        (PatternVariableNode "var1")
      )
      (InheritanceLink
        (VariableNode "$X")
        (VariableNode "$B-6266d6f2")
      )
      (InheritanceLink
        (VariableNode "$B-6266d6f2")
      (PatternVariableNode "var1")
      )

This is a pattern that may be mined by patten miner from the PLN corpus under a general purpose. But it is not that kind of expected patterns as descriped in http://wiki.opencog.org/w/Pattern_Miner_Prospective_Examples#patterns_in_PLN_inference_histories

Actually, the particular goal here is not to mine any connected patterns freely, it is to mine a particular type of patterns - abstraction of BindLinks of the same structures. If two BindLinks have different structures, even they share one or several Nodes, patterns still should not be extracted from them. For example,

(BindLink

(LinkTypeA

         (NodeType_a "someNode1")
         (NodeType_b "someNode2")
)
(LinkTypeB
         (NodeType_c "someNode3")
         (LinkTypeC
           (NodeType_c "someNode3")
               (NodeType_d "someNode4")
          )
)
)

(BindLink
(LinkTypeA
         (NodeType_a "someNode1")
         (NodeType_e "someNode5)
)
(LinkTypeD
         (NodeType_e "someNode5")
         (NodeType_f "someNode6")
)
)

This two BindLinks share the same Node (NodeType_a "someNode1"), a common pattern of (LinkTypeA) can be extracted for mining general patterns, but these two BindLinks have different structures - the first BindLink contains a LinkTypeA , a LinkTypeB and a LinkTypeC; the second BindLink contains a LinkTypeA and a LinkTypeD. So despite the ultimate goal of AGI, to learning this type of patterns more effectively, it's better to find all the BindLinks with same structures, and then apply some kind of induction learning algorithm on them. What do you think?

But I will still give it a try with Pattern Miner.

Nil Geisweiller

unread,

Jun 1, 2017, 1:25:06 AM6/1/17

to Shujing Ke, Nil Geisweiller, Ben Goertzel, opencog

Hi,

On 06/01/2017 01:32 AM, Shujing Ke wrote:
> Hi, Nil and Ben,
>
> I studied the corpus. Is each BindLink one instance of inference? So

Yes.

> that each BindLink should be considered as primitve / atomic - one
> pattern should be one BindLink; any Links inside a BindLink should not
> be mined separatly, right? For example,

No they can and should be mined separately as well. Specifically what we
are interested in are the structures of ExecutionOutputLink (EOL). The
third argument of an inference BindLink is systematically gonna be an
EOL wrapping other EOLs, and we are mostly interested in mining these
EOLs. But ultimately mining the whole BindLink might be useful too. We
may want to do both, but for starter only mine patterns with an EOL as
root link.

No we want to extract patterns across BindLinks (or EOLs) that have
different structures, what I believe the pattern miner is good at, right?

Nil

Shujing Ke

unread,

Jun 1, 2017, 9:52:47 AM6/1/17

to Nil Geisweiller, Ben Goertzel, opencog

OK, I will try to mine EOLs first. Thanks : )

Shujing

Shujing Ke

unread,

Jun 1, 2017, 10:00:02 AM6/1/17

to Nil Geisweiller, Ben Goertzel, opencog

Oh, another question: is to mine patterns that contains at least one ExecutionOutputLink, or to mine patterns that only contains ExecutionOutputLinks and the Links inside ExecutionOutputLinks?

Nil Geisweiller

unread,

Jun 1, 2017, 11:17:31 AM6/1/17

to Shujing Ke, Nil Geisweiller, Ben Goertzel, opencog

On 06/01/2017 04:59 PM, Shujing Ke wrote:
> Oh, another question: is to mine patterns that contains at least one
> ExecutionOutputLink, or to mine patterns that only contains
> ExecutionOutputLinks and the Links inside ExecutionOutputLinks?

I'd say all of them, at any depth. The corpus I gave you is not gonna
contain any useful pattern anyway, it's just an exercise at this point.

Nil

> <mailto:ngei...@googlemail.com

Nil Geisweiller

unread,

Jun 1, 2017, 11:17:33 AM6/1/17

to Shujing Ke, Nil Geisweiller, Ben Goertzel, opencog

On 06/01/2017 04:59 PM, Shujing Ke wrote:

> Oh, another question: is to mine patterns that contains at least one
> ExecutionOutputLink, or to mine patterns that only contains
> ExecutionOutputLinks and the Links inside ExecutionOutputLinks?

I'd say all of them, at any depth. The corpus I gave you is not gonna
contain any useful pattern anyway, it's just an exercise at this point.

Nil

>

> <mailto:ngei...@googlemail.com

Shujing Ke

unread,

Jun 1, 2017, 5:00:24 PM6/1/17

to opencog, Nil Geisweiller, Ben Goertzel

Ok : )

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/babb8f59-9817-f9f3-218b-2975cca792d3%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Shujing Ke

unread,

Jun 4, 2017, 5:06:57 PM6/4/17

to opencog, Nil Geisweiller, Ben Goertzel

Hi Nil and Ben,

The Variablenodes of the same name in different clauses in a PLN corpus do not really have to mean the same thing, therefore even they are of the same name, still should not be considered as connected. And vice versa , different variable node names in different clauses could still mean the same thing. For example:

Clause 1:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula") ;
    (ListLink
      (InheritanceLink
        (VariableNode "$X") ;
        (ConceptNode "D") ;

      ) ;
      (InheritanceLink
        (VariableNode "$X") ;
        (VariableNode "$B-6266d6f2")
      )
      (InheritanceLink
        (VariableNode "$B-6266d6f2")

        (ConceptNode "D")
      )
    )
)

Clause 2:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: xxxxx-formula") ;
    (ListLink
      (AAALink
        (VariableNode "$X") ;
        (ConceptNode "R") ;
      ) ;
      (BBLink
        (VariableNode "$X") ;
        (VariableNode "$Y")
      )
      (CCCLink
        (VariableNode "$Y")
        (ConceptNode "R")
      )
    )
)

Clause 3:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula") ;
    (ListLink
      (InheritanceLink
        (VariableNode "$W") ;
        (ConceptNode "D") ;
      ) ;
      (InheritanceLink
        (VariableNode "$Z") ;
        (VariableNode "$B-FFFF")
      )
      (InheritanceLink
        (VariableNode "$B-FFFF")
        (ConceptNode "D")
      )
    )
)

Clause 4:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula") ;
    (ListLink
      (InheritanceLink
        (VariableNode "$M") ;
        (ConceptNode "P") ;
      ) ;
      (InheritanceLink
        (VariableNode "$N") ;
        (VariableNode "$B-FFFF")
      )
      (InheritanceLink
        (VariableNode "$B-FFFF")
        (ConceptNode "P")
      )
    )
)

(VariableNode "$X") exist in both clause 1 and 2, but they do not really have to mean the same thing, so they should not connect.

clause 1 and 3 have the common pattern 1,2 and 3; clause 1 and 4 have the common pattern 3:

Pattern 1:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula")
    (ListLink
      (InheritanceLink
        (PatternVariableNode "$var1")
        (ConceptNode "D")
      )
      (InheritanceLink
        (PatternVariableNode "$var1")

        (VariableNode "$B-6266d6f2")
      )
      (InheritanceLink
        (VariableNode "$B-6266d6f2")

        (ConceptNode "D")
      )
    )
)

Pattern 2:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula")
    (ListLink
      (InheritanceLink
        (PatternVariableNode "$var1")
        (ConceptNode "D")
      )
      (InheritanceLink
        (PatternVariableNode "$var1")
        (PatternVariableNode "$var2")
      )
      (InheritanceLink
        (PatternVariableNode "$var2")
        (ConceptNode "D")
      )
    )
)

Pattern 3:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula")
    (ListLink
      (InheritanceLink
        (PatternVariableNode "$var1")
        (PatternVariableNode "$var3")
      )
      (InheritanceLink
        (PatternVariableNode "$var1")
        (PatternVariableNode "$var2")
      )
      (InheritanceLink
        (PatternVariableNode "$var2")
        (PatternVariableNode "$var3")
      )
    )
)

Acutally, I guess the expected pattern here is Pattern 3, but the exact expected format of it is Pattern 4:
Pattern 4:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula")
    (ListLink
      (InheritanceLink
        (VariableNode "$var1")
        (PatternVariableNode "$pattern_var1")
      )
      (InheritanceLink
        (VariableNode "$var1")
        (VariableNode "$var2")
      )
      (InheritanceLink
        (VariableNode "$var2")
        (PatternVariableNode "$pattern_var1")
      )
    )
)

Which means in the process of pattern miner, I probably should do this:

1. Do not consider any VariableNodes are connected with each other out of a clause, even they have the same name.

2. The original variablenodes in a clause should not be abstracted as normal PatternVariableNode, but they have to be turned into unique name for middle process, because there could be even more confusing sistuation like clause 5 and clause 1, where the (VariableNode "$B-6266d6f2") and (VariableNode "$X") are just swopped, but they are acutally the same clause, this will be too confusing.

Clause 5:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula")
    (ListLink
      (InheritanceLink
        (VariableNode "$B-6266d6f2")
        (ConceptNode "D")
      )
      (InheritanceLink
        (VariableNode "$B-6266d6f2") ;
        (VariableNode "$X") ;

      )
      (InheritanceLink
        (VariableNode "$X")

        (ConceptNode "D")
      )
    )
)

3. After turn each original variablenode into uniqu variablenode, extract patterns in the format of Pattern 3, and then turn all the orignal variablenode back from PatternVariableNode backs to variablenodes, and then unify the variable names in the order of their appearance, so as to get pattern 4.

4. (optional), if it is necessary, the output format can also be turned into Pattern 5, denpends on if need to distinguish the variablenodes that reprent the orignal variablenodes or not.
Pattern 5:
(ExecutionOutputLink
    (GroundedSchemaNode "scm: bc-deduction-formula")
    (ListLink
      (InheritanceLink
        (VariableNode "$var1")
        (VariableNode "$var3")
      )
      (InheritanceLink
        (VariableNode "$var1")
        (VariableNode "$var2")
      )
      (InheritanceLink
        (VariableNode "$var2")
        (VariableNode "$var3")
      )
    )
)

5. (optional), if it is necessary, TypedVariableLinks can be added to specify the original variablenodes:
(TypedVariableLink
(VariableNode "$var1")

(TypeNode "VariableNode")
)

Is this process OK?

One more question:
Is GroundedSchemaNode also to become variablenode? or doesn't make much sense to consider it as variablenode?

Thanks,

Shujing

Ben Goertzel

unread,

Jun 4, 2017, 9:31:27 PM6/4/17

to Shujing Ke, opencog, Nil Geisweiller

On Mon, Jun 5, 2017 at 5:06 AM, Shujing Ke <shuj...@gmail.com> wrote:
> Is this process OK?

too busy right now, so I'll let Nil study and answer

> One more question:
> Is GroundedSchemaNode also to become variablenode? or doesn't make much
> sense to consider it as variablenode?

There will be cases where it's considered as a VariableNode...

Nil Geisweiller

unread,

Jun 5, 2017, 3:09:57 AM6/5/17

to Shujing Ke, opencog, Nil Geisweiller, Ben Goertzel

Hi Shuijing,

that is where CHandle could be useful. If there are equal it means they
are bound to the same scope, and thus should be considered the same. In
practice you won't find many patterns with persistent original variables
because these variables will have different scopes at most of the time,
thus they will likely be replaced by pattern variables.

So for example if you get the 2 groundings

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula") ;
> (ListLink
> (InheritanceLink

> (VariableNode "$X") ; <- bound to scope-1

> (ConceptNode "D") ;
> ) ;
> (InheritanceLink

> (VariableNode "$X") ; <- bound to scope-1
> (VariableNode "$B-6266d6f2") <- bound to scope-1
> )
> (InheritanceLink
> (VariableNode "$B-6266d6f2") <- bound to scope-1

> (ConceptNode "D")
> )
> )
> )

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula") ;
> (ListLink
> (InheritanceLink

> (VariableNode "$X") ; <- bound to scope-2

> (ConceptNode "D") ;
> ) ;
> (InheritanceLink

> (VariableNode "$X") ; <- bound to scope-2
> (ConceptNode "A")
> )
> (InheritanceLink
> (ConceptNode "A")

> (ConceptNode "D")
> )
> )
> )

You will produce the pattern

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula") ;
> (ListLink
> (InheritanceLink

> (VariableNode "$pattern-var1") ;

> (ConceptNode "D") ;
> ) ;
> (InheritanceLink

> (VariableNode "$pattern-var1") ;
> (VariableNode "$pattern-var2")
> )
> (InheritanceLink
> (VariableNode "$pattern-var2")

> (ConceptNode "D")
> )
> )
> )

The type of $pattern-var1 would be VariableNode (as you suggest below),
and the type of $pattern-var2 would be Node, cause it's the least
abstract union type of VariableNode (for the original variable
(VariableNode "$B-6266d6f2")) and ConceptNode (for (ConceptNode "A")).

But perhaps you don't need worry about typing pattern variables for now,
unless you're code already takes care of it.

See more below.

On 06/05/2017 12:06 AM, Shujing Ke wrote:
> *Clause 1:*

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula") ;
> (ListLink
> (InheritanceLink
> (VariableNode "$X") ;
> (ConceptNode "D") ;
> ) ;
> (InheritanceLink
> (VariableNode "$X") ;
> (VariableNode "$B-6266d6f2")
> )
> (InheritanceLink
> (VariableNode "$B-6266d6f2")
> (ConceptNode "D")
> )
> )
> )
>
>

> *Clause 2:*

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: xxxxx-formula") ;
> (ListLink
> (AAALink
> (VariableNode "$X") ;
> (ConceptNode "R") ;
> ) ;
> (BBLink
> (VariableNode "$X") ;
> (VariableNode "$Y")
> )
> (CCCLink
> (VariableNode "$Y")
> (ConceptNode "R")
> )
> )
> )

> (VariableNode "$X") exist in both clause 1 and 2, but they do not really
> have to mean the same thing, so they should not connect.

Again, if they are bound to the same scope then they should mean the
same thing. That would be the case if both clauses belong to the same
large scoped tree, and you're trying to find patterns inside this tree.
Not probable but possible.

>
> clause 1 and 3 have the common pattern 1,2 and 3; clause 1 and 4 have
> the common pattern 3:

> *
> Pattern 1:*

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula")
> (ListLink
> (InheritanceLink
> (PatternVariableNode "$var1")
> (ConceptNode "D")
> )
> (InheritanceLink
> (PatternVariableNode "$var1")
> (VariableNode "$B-6266d6f2")
> )
> (InheritanceLink
> (VariableNode "$B-6266d6f2")
> (ConceptNode "D")
> )
> )
> )

My suggestion is: only bind pattern variables to the pattern scope (like
SatisfyingSetScopeLink) and let the original variable unbound. That way
you don't need to introduce a new PatternVariableNode type to make the
distinction between pattern variables and original variables treated as
constant. However prefixing the pattern variable names by "pattern", as
you did further below, is a good idea for human readability.

>
> *Pattern 2:*

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula")
> (ListLink
> (InheritanceLink
> (PatternVariableNode "$var1")
> (ConceptNode "D")
> )
> (InheritanceLink
> (PatternVariableNode "$var1")
> (PatternVariableNode "$var2")
> )
> (InheritanceLink
> (PatternVariableNode "$var2")
> (ConceptNode "D")
> )
> )
> )
>

> *Pattern 3:*

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula")
> (ListLink
> (InheritanceLink
> (PatternVariableNode "$var1")
> (PatternVariableNode "$var3")
> )
> (InheritanceLink
> (PatternVariableNode "$var1")
> (PatternVariableNode "$var2")
> )
> (InheritanceLink
> (PatternVariableNode "$var2")
> (PatternVariableNode "$var3")
> )
> )
> )

Pattern 3 is more abstract than pattern 2. Would you not want to return
the least possible abstract patterns with the greatest support (or
greatest given fitness)? But it's another issue anyway...

>
> Acutally, I guess the expected pattern here is Pattern 3, but the exact
> expected format of it is Pattern 4:

> *Pattern 4:*

> (ExecutionOutputLink
> (GroundedSchemaNode "scm: bc-deduction-formula")
> (ListLink
> (InheritanceLink
> (VariableNode "$var1")
> (PatternVariableNode "$pattern_var1")
> )
> (InheritanceLink
> (VariableNode "$var1")
> (VariableNode "$var2")
> )
> (InheritanceLink
> (VariableNode "$var2")
> (PatternVariableNode "$pattern_var1")
> )
> )
> )

Again, just have $pattern_var1 scoped to the SatisfyingSetScopeLink of
the pattern, and let $var1 and $var2 free. Again these patterns, with
original variables, are gonna be unlikely (in my use cases anyway), but
they might be meaningful in some situations.

>
> Which means in the process of pattern miner, I probably should do this:

> *1*. Do not consider any VariableNodes are connected with each other

> out of a clause, even they have the same name.

Again, only assume that variables with different scopes are not
connected to each others.

> *5*. (optional), if it is necessary, TypedVariableLinks can be added to

> specify the original variablenodes:
> (TypedVariableLink
> (VariableNode "$var1")
> (TypeNode "VariableNode")
> )

Yes, only if var1 is a pattern variable, not if it is an original
variable, and this type declaration would be inserted in the variable
declaration of the pattern scope (like SatisfyingSetScopeLink). If it is
an original variable let it as is, it's unlikely anyway, so even if it
turns out to be problematic we can worry about that later.

>
> Is this process OK?
>
> One more question:
> Is GroundedSchemaNode also to become variablenode?

Possibly, as Ben said.

Nil

>
> Thanks,
> Shujing
>
> On Thu, Jun 1, 2017 at 11:00 PM, Shujing Ke <shuj...@gmail.com
> <mailto:shuj...@gmail.com>> wrote:
>
> Ok : )
>
> On Thu, Jun 1, 2017 at 5:17 PM, 'Nil Geisweiller' via opencog
> <ope...@googlegroups.com <mailto:ope...@googlegroups.com>> wrote:
>
> On 06/01/2017 04:59 PM, Shujing Ke wrote:
>
> Oh, another question: is to mine patterns that contains at
> least one ExecutionOutputLink, or to mine patterns that only
> contains ExecutionOutputLinks and the Links inside
> ExecutionOutputLinks?
>
>
> I'd say all of them, at any depth. The corpus I gave you is not
> gonna contain any useful pattern anyway, it's just an exercise
> at this point.
>
> Nil
>
>
> On Thu, Jun 1, 2017 at 3:52 PM, Shujing Ke
> <shuj...@gmail.com <mailto:shuj...@gmail.com>

> <mailto:shuj...@gmail.com <mailto:shuj...@gmail.com>>>

> wrote:
>
> OK, I will try to mine EOLs first. Thanks : )
>
> Shujing
>
> On Thu, Jun 1, 2017 at 7:25 AM, Nil Geisweiller
> <ngei...@googlemail.com
> <mailto:ngei...@googlemail.com>

> <mailto:ngei...@googlemail.com
> <mailto:ngei...@googlemail.com>>> wrote:
>
> Hi,
>

> it, send an email to opencog+u...@googlegroups.com
> <mailto:opencog%2Bunsu...@googlegroups.com>.

> To post to this group, send email to ope...@googlegroups.com

> <mailto:ope...@googlegroups.com>.

> Visit this group at https://groups.google.com/group/opencog

> <https://groups.google.com/group/opencog>.

> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/babb8f59-9817-f9f3-218b-2975cca792d3%40gmail.com

> <https://groups.google.com/d/msgid/opencog/babb8f59-9817-f9f3-218b-2975cca792d3%40gmail.com>.

> For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.
>
>
>

Shujing Ke

unread,

Jun 14, 2017, 9:32:36 AM6/14/17

to Nil Geisweiller, opencog, Ben Goertzel

Hi, Nil and Ben,

Here are some sample patterns of the first result from part of the corpus. Can you look at it to see if it contains any expected patterns and any patterns that we don't want?

Several issues:

1. Current Settings:

1)Only mine 1 gram patterns, the reason is explained in 2.

2)Only mine ExecutionOutputLinks in this test. We can try to mine other Links later.

3)GroundedSchemaNode and TypeNode are not considered to become VariableNodes. Seems it doesn't make a lot of sense to make them into variables. Also, if they become Variables, there are errors because the atom core will check the types and get the opencog::SyntaxException below:
Unexpected contents in TypedVariableLink Expected type specifier (e.g. TypeNode, TypeChoice, etc.), got PatternVariableNode

and
ExecutionOutputLink must have schema! Got (PatternVariableNode ...

4)In current stage, both VariableNode and PatternVariableNode are used to distinguish the Variables generated by Pattern Miner and the original Variables. We can unfiy them later if need.

2. About pattern gram

Actually in this application, only 1-gram patterns are wanted. In previous applications, like mining from DBPedia and conceptnet, n-gram pattern contains n Links, but the Link is relatively small, like one EvalutionLink or Inheritance Link. For example:

EvaluationLink

PredicateNode "Country"

ListLink

"var_1"

                "USA"
EvaluationLink
         PredicateNode "Language"
         ListLink
               "var_1"

"English"

Above is a 2-gram pattern, because each fact in DBPedia is just one EvaluationLink. This two Links are connected via the variablenode "var_1". But in the pln corpus case, each ExecutionOutputLinks is big, it contains a lot of Links in it. It seems we don't really want the patterns that contains multiple ExecutionOutputLinks in one single pattern; but we want to find the common abstract patterns of each single ExecutionOutputLink. So it is actually just 1-gram.

3. The interestingness evaluation is different from previous applications

Our interestingness evalution is based on surpringness measure, which includes Surpingness_I and Surpringness_II:
Surpingness_I : how difficult the actual frequency of a n-gram pattern can be infered from all its (n-1)-gram to 1-gram subpatterns' frequency.
Surpingness_II : how difficult the actual frequency of a n-gram pattern can be infered from all its (n+1)-gram super patterns' frequency.

But in the pln corpus, we only mine 1 gram, and I guess the interesting patterns here you want to identify is the patterns of "the max degree of abstraction" , for example:

pattern1: (x and y are friends) (x is musician) (y is musician) (z is musician) (z and y are friends)->(x and z are friends)
pattern2: (x and y are friends) (x is var_job) (y is var_job) (z is var_job) (z and y are friends)->(x and z are friends)

If pattern 1 occurs 10 times; pattern 2 also occurs 10 times, it means that pattern 2 only be right when var_job = musician, which means the abstraction to be pattern 2 is no sense. So patten 1 is already the max degree of abstraction in this case. If my unerstand is right, then I will need to write a new interestingness evalution for this, because it is different from the surpringness measure.

4. Unify the link order in unordered Links

In current stage, I haven't unify the order of Links in unordered Links in the corpus, like AndLink. For example:
In the AndLink below in the corpus, the 3 EvaluationLinks are possible to be in a different order for different instances, which will affect the structure of the AndLink, but they actually should be order independent. So if this sistuation does exist in the pln corpus other other applications in future, I may need to unify the order of the input Links to be a unique order (just like the pattern isomorphic problem I solved before in pattern miner), but I am not sure I can find a way to make all of them have an a unique order; the worse case, we have to generate all the possible combinations for each unorderedLink in its patterns.

                  AndLink
                    EvaluationLink
                      (PredicateNode "are-friends")
                       ListLink
                        (ConceptNode "John")
                        (VariableNode "$Y-37aad5ea")
                    (EvaluationLink
                      (PredicateNode "is-musician")
                      (VariableNode "$Y-37aad5ea")
                    (EvaluationLink
                      (PredicateNode "is-musician")
                      (VariableNode "John")

Thanks,

Shujing

it, send an email to opencog+unsubscribe@googlegroups.com
<mailto:opencog%2Bunsubscribe@googlegroups.com>.

To post to this group, send email to ope...@googlegroups.com

<mailto:opencog@googlegroups.com>.

pln pattern smaples.scm

Ben Goertzel

unread,

Jun 14, 2017, 10:01:52 AM6/14/17

to Shujing Ke, Nil Geisweiller, opencog

Hi,

I'm busy tonight and tomorrow but can look at this Friday, unless Nil
has studied it first ;)

About

> GroundedSchemaNode and TypeNode are not considered to become VariableNodes.
> Seems it doesn't make a lot of sense to make them into variables.

well it does make sense to make these into variables, but it may not
be urgent...

In dependent type theory one has lots of type-valued variables, it's
actually critical to programming in languages like Agda that are based
on dependent types and Curry-Howard correspondence, etc.

About pattern gram, I'm a little confused. We do want patterns
involving more than one PatternVariableNode, in mining patterns from
the PLN inference histories... don't we?

Shujing Ke

unread,

Jun 14, 2017, 6:20:08 PM6/14/17

to Ben Goertzel, Nil Geisweiller, opencog

Pattern gram in the current implementation is about the numbers of root Links one pattern contains. A 1-gram pattern can contain many many variablenodes if there are many many Links nested in a root Link, in this case, the ExecutionOutputLinks .

Shujing Ke

unread,

Jun 18, 2017, 6:10:41 PM6/18/17

to Ben Goertzel, Nil Geisweiller, opencog

Anyone looked at the sample file yet?

Because each Link is big, my machine cannot finish mining all the patterns from the whole corpus with current setting. It would be nice if anyone can give more ideas about what kinds of patterns should be kept and what kinds of patterns can be filtered out in the sample file.

Thanks,

Shujing

Ben Goertzel

unread,

Jun 19, 2017, 12:27:07 AM6/19/17

to opencog, Nil Geisweiller

Hi Nil,

We need to get back to Shujing on all these issues ASAP so she can proceed.

I will think/look carefully regarding the Interestingness Measure
issue, as the probabilistic surprisingness aspect of this is something
I have thought about a lot...

If you can look at the other issues carefully today or tomorrow, that
will be valuable... thanks a lot

Shujing: About the definition of "pattern gram", OK I see how the code
works now. But it seems to me that counting the number of "root
links" is a bit arbitrary, isn't it? The number of Atoms in a
pattern altogether is meaningful, and the number of variables in the
pattern is meaningful (and I guess how "loose" are the type
restrictions on these variables, is also meaningful). But the number
of root links doesn't mean much, as a root link can supervene over a
huge or tiny tree of Atoms, right?

Hypergraphs are complicated ;p ...

ben

> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.

> To view this discussion on the web visit

> https://groups.google.com/d/msgid/opencog/CALpD4-%2B2e0YrjhxTdzspaSOCZmHSwd2RMRw2KF0PGsP1epYwxg%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

Nil Geisweiller

unread,

Jun 19, 2017, 4:05:51 AM6/19/17

to Shujing Ke, opencog, Ben Goertzel

Hi,

sorry, I was trapped in a variadic-template-hole. I managed to escape
but not without bringing back a vicious gift. Now everything looks like
a variadic template to me.

On 06/14/2017 04:32 PM, Shujing Ke wrote:
> 3)GroundedSchemaNode and TypeNode are not considered to become
> VariableNodes. Seems it doesn't make a lot of sense to make them into
> variables. Also, if they become Variables, there are errors because the
> atom core will check the types and get the opencog::SyntaxException below:
> Unexpected contents in TypedVariableLink Expected type specifier
> (e.g. TypeNode, TypeChoice, etc.), got PatternVariableNode
> and
> ExecutionOutputLink must have schema! Got (PatternVariableNode ...

These errors should go away if you quote them.

>
> 4)In current stage, both VariableNode and PatternVariableNode are used
> to distinguish the Variables generated by Pattern Miner and the original
> Variables. We can unfiy them later if need.

As soon as these variable are scoped it can be unified. I don't mind to
retain PatternVariableNode if it has a specific semantics, otherwise it
should go.

>
> *2. About pattern gram*

> Actually in this application, only 1-gram patterns are wanted. In
> previous applications, like mining from DBPedia and conceptnet, n-gram
> pattern contains n Links, but the Link is relatively small, like one
> EvalutionLink or Inheritance Link. For example:
> EvaluationLink
> PredicateNode "Country"
> ListLink
> "var_1"
> "USA"
> EvaluationLink
> PredicateNode "Language"
> ListLink
> "var_1"
> "English"
> Above is a 2-gram pattern, because each fact in DBPedia is just one
> EvaluationLink. This two Links are connected via the variablenode
> "var_1". But in the pln corpus case, each ExecutionOutputLinks is big,
> it contains a lot of Links in it. It seems we don't really want the
> patterns that contains multiple ExecutionOutputLinks in one single
> pattern; but we want to find the common abstract patterns of each single
> ExecutionOutputLink. So it is actually just 1-gram.

I'm confused, is it a 2-gram because the pattern is the conjunction of 2
EvaluationLinks, or because each pattern pattern has 2 links in them
(EvaluationLink and ListLink)?

BTW, I think you could use AndLink root links to denote conjunctions of
patterns, like in the pattern matcher, so for instance the pattern above
would be

SatifyingSetScopeLink
Variable "var_1"
AndLink

EvaluationLink
PredicateNode "Country"
ListLink
"var_1"
"USA"
EvaluationLink
PredicateNode "Language"
ListLink
"var_1"
"English"

In case you want to represent a pattern of AndLink you could local quote
it, like

SatifyingSetScopeLink
Variable "var_1"
LocalQuoteLink
AndLink

EvaluationLink
PredicateNode "Country"
ListLink
"var_1"
"USA"
EvaluationLink
PredicateNode "Language"
ListLink
"var_1"
"English"

like in the pattern matcher. What do you think?

> *3. The interestingness evaluation is different from previous application*s

> Our interestingness evalution is based on surpringness measure, which
> includes Surpingness_I and Surpringness_II:
> Surpingness_I : how difficult the actual frequency of a n-gram pattern
> can be infered from all its (n-1)-gram to 1-gram subpatterns' frequency.
> Surpingness_II : how difficult the actual frequency of a n-gram pattern
> can be infered from all its (n+1)-gram super patterns' frequency.
> But in the pln corpus, we only mine 1 gram, and I guess the interesting
> patterns here you want to identify is the patterns of "the max degree of
> abstraction" , for example:
> pattern1: (x and y are friends) (x is musician) (y is musician) (z is
> musician) (z and y are friends)->(x and z are friends)
> pattern2: (x and y are friends) (x is var_job) (y is var_job) (z is
> var_job) (z and y are friends)->(x and z are friends)
> If pattern 1 occurs 10 times; pattern 2 also occurs 10 times, it means
> that pattern 2 only be right when var_job = musician, which means the
> abstraction to be pattern 2 is no sense. So patten 1 is already the max
> degree of abstraction in this case. If my unerstand is right, then I
> will need to write a new interestingness evalution for this, because it
> is different from the surpringness measure.

That sounds right but I think I need to understand better how the
pattern miner algorithm operates. I'll look into that soon.

>
> *4. Unify the link order in unordered Links*

> In current stage, I haven't unify the order of Links in unordered Links
> in the corpus, like AndLink. For example:
> In the AndLink below in the corpus, the 3 EvaluationLinks are possible
> to be in a different order for different instances, which will affect
> the structure of the AndLink, but they actually should be order
> independent. So if this sistuation does exist in the pln corpus other
> other applications in future, I may need to unify the order of the input
> Links to be a unique order (just like the pattern isomorphic problem I
> solved before in pattern miner), but I am not sure I can find a way to
> make all of them have an a unique order; the worse case, we have to
> generate all the possible combinations for each unorderedLink in its
> patterns.
>
> AndLink
> EvaluationLink
> (PredicateNode "are-friends")
> ListLink
> (ConceptNode "John")
> (VariableNode "$Y-37aad5ea")
> (EvaluationLink
> (PredicateNode "is-musician")
> (VariableNode "$Y-37aad5ea")
> (EvaluationLink
> (PredicateNode "is-musician")
> (VariableNode "John")

Yes, this is gonna be needed, although for now it can wait I think.
Regarding how to solve it, I afraid you're gonna have to consider all
permutations, like the unifier and pattern matcher do.

Nil

> <mailto:shuj...@gmail.com <mailto:shuj...@gmail.com>>> wrote:
>
> Ok : )
>
> On Thu, Jun 1, 2017 at 5:17 PM, 'Nil Geisweiller' via opencog

> <ope...@googlegroups.com <mailto:ope...@googlegroups.com>
> <mailto:ope...@googlegroups.com

> <mailto:ngei...@googlemail.com
> <mailto:ngei...@googlemail.com>
> <mailto:ngei...@googlemail.com
> <mailto:ngei...@googlemail.com>>>> wrote:
>
> Hi,
>

> opencog+u...@googlegroups.com
> <mailto:opencog%2Bunsu...@googlegroups.com>
> <mailto:opencog%2Bunsu...@googlegroups.com
> <mailto:opencog%252Buns...@googlegroups.com>>.

> To post to this group, send email to

> ope...@googlegroups.com <mailto:ope...@googlegroups.com>
> <mailto:ope...@googlegroups.com
> <mailto:ope...@googlegroups.com>>.

Nil Geisweiller

unread,

Jun 19, 2017, 4:24:41 AM6/19/17

to Nil Geisweiller, Shujing Ke, opencog, Ben Goertzel

Shujing, just please don't set the confidences inside the patten to 1.0
like in

(ExecutionOutputLink (stv 1.000000 1.000000)
(GroundedSchemaNode "scm: bc-deduction-formula" (stv 1.000000 1.000000))
(ListLink (stv 1.000000 1.000000)
(InheritanceLink (stv 1.000000 1.000000)
(PatternVariableNode "$var_1" (stv 1.000000 1.000000))
(PatternVariableNode "$var_2" (stv 1.000000 1.000000))
)
(InheritanceLink (stv 1.000000 1.000000)
(PatternVariableNode "$var_1" (stv 1.000000 1.000000))
(PatternVariableNode "$var_3" (stv 1.000000 1.000000))
)
(InheritanceLink (stv 1.000000 1.000000)
(PatternVariableNode "$var_3" (stv 1.000000 1.000000))
(PatternVariableNode "$var_2" (stv 1.000000 1.000000))
)
)
)

this will confuse PLN. The only greater-than-zero confidence should be
on the link wrapping the pattern, SatisfyingSetScopeLink or such.

Other than that they look OK. Of course they should be wrapped in an
appropriate link and given a meaningful TV. I'm thinking we probably
want to use a conditional link, an implication, inheritance, or perhaps
context link, as opposed to a SatisfyingSetScopeLink, but we can figure
that out slightly later.

Nil

Nil Geisweiller

unread,

Jun 19, 2017, 5:01:12 AM6/19/17

to Nil Geisweiller, Shujing Ke, opencog, Ben Goertzel

Actually patterns involving scopes require quote links. Let me consider
the following pattern (the simplest of that sort I could find):

;Pattern: Frequency = 6
(ExecutionOutputLink (stv 1.000000 1.000000)
(GroundedSchemaNode "scm: conditional-full-instantiation-formula"

(stv 1.000000 1.000000))
(ListLink (stv 1.000000 1.000000)

(EvaluationLink (stv 1.000000 1.000000)

(PatternVariableNode "$var_1" (stv 1.000000 1.000000))

(ListLink (stv 1.000000 1.000000)

(PatternVariableNode "$var_2" (stv 1.000000 1.000000))

(PatternVariableNode "$var_3" (stv 1.000000 1.000000))
)

)
(ImplicationScopeLink (stv 1.000000 1.000000)
(VariableList (stv 1.000000 1.000000)
(TypedVariableLink (stv 1.000000 1.000000)
(VariableNode "$X" (stv 1.000000 1.000000))
(TypeNode "ConceptNode" (stv 1.000000 1.000000))
)
(TypedVariableLink (stv 1.000000 1.000000)
(PatternVariableNode "$var_4" (stv 1.000000 1.000000))
(TypeNode "ConceptNode" (stv 1.000000 1.000000))
)
)
(EvaluationLink (stv 1.000000 1.000000)

(PatternVariableNode "$var_1" (stv 1.000000 1.000000))

(ListLink (stv 1.000000 1.000000)
(VariableNode "$X" (stv 1.000000 1.000000))
(PatternVariableNode "$var_4" (stv 1.000000 1.000000))
)
)
(EvaluationLink (stv 1.000000 1.000000)

(PatternVariableNode "$var_1" (stv 1.000000 1.000000))

(ListLink (stv 1.000000 1.000000)
(PatternVariableNode "$var_4" (stv 1.000000 1.000000))
(VariableNode "$X" (stv 1.000000 1.000000))
)
)
)
(EvaluationLink (stv 1.000000 1.000000)

(PatternVariableNode "$var_1" (stv 1.000000 1.000000))

(ListLink (stv 1.000000 1.000000)

(PatternVariableNode "$var_3" (stv 1.000000 1.000000))
(PatternVariableNode "$var_2" (stv 1.000000 1.000000))
)
)
)

)

Let me

1. wrap it in a SatisfyingSetScopeLink
2. attempt to add a meaningful TV
3. remove the non-zero confidences elsewhere
4. add the required quotes

(SatisfyingSetScopeLink <strength=6/N, count=N> ;; not sure about N
(VariableList
(TypedVariableLink
(PatternVariableNode "$var_1")
(TypeNode "PredicateNode")
)
(TypedVariableLink
(PatternVariableNode "$var_2")
(TypeNode "ConceptNode")
)
(TypedVariableLink
(PatternVariableNode "$var_3")
(TypeNode "ConceptNode")
)
(TypedVariableLink
(PatternVariableNode "$var_4")
(TypeNode "VariableNode")
)
)
(Quote
(ExecutionOutputLink
(GroundedSchemaNode "scm: conditional-full-instantiation-formula"
(ListLink
(EvaluationLink
(UnquoteLink (PatternVariableNode "$var_1"))
(ListLink
(UnquoteLink (PatternVariableNode "$var_2"))
(UnquoteLink (PatternVariableNode "$var_3"))
)
)
(ImplicationScopeLink
(VariableList
(TypedVariableLink
(VariableNode "$X")
(TypeNode "ConceptNode")
)
(TypedVariableLink
(UnquoteLink (PatternVariableNode "$var_4"))
(TypeNode "ConceptNode")
)
)
(EvaluationLink
(UnquoteLink (PatternVariableNode "$var_1"))
(ListLink
(VariableNode "$X")
(UnquoteLink (PatternVariableNode "$var_4"))
)
)
(EvaluationLink
(UnquoteLink (PatternVariableNode "$var_1"))
(ListLink
(UnquoteLink (PatternVariableNode "$var_4"))
(VariableNode "$X")
)
)
)
(EvaluationLink
(UnquoteLink (PatternVariableNode "$var_1"))
(ListLink
(UnquoteLink (PatternVariableNode "$var_3"))
(UnquoteLink (PatternVariableNode "$var_2"))
)
)
)
)
)
)

Nil

Nil Geisweiller

unread,

Jun 19, 2017, 5:38:32 AM6/19/17

to Nil Geisweiller, Shujing Ke, opencog, Ben Goertzel

Shujing, in

<OPEMCOG_ROOT>/opencog/learning/PatternMiner/types/atom_types.script

you've defined

PATTERN_LINK <- UNORDERED_LINK

but such a link type already exist in

<ATOMSPACE_ROOT>/opencog/atoms/base/atom_types.script

Nil

Ben Goertzel

unread,

Jun 19, 2017, 12:49:07 PM6/19/17

to Shujing Ke, Nil Geisweiller, opencog

(Nil, please look at the end of this email, I have a suggestion for
you there...)

On Wed, Jun 14, 2017 at 9:32 PM, Shujing Ke <shuj...@gmail.com> wrote:
> 3. The interestingness evaluation is different from previous applications
> Our interestingness evalution is based on surpringness measure, which
> includes Surpingness_I and Surpringness_II:
> Surpingness_I : how difficult the actual frequency of a n-gram pattern can
> be infered from all its (n-1)-gram to 1-gram subpatterns' frequency.
> Surpingness_II : how difficult the actual frequency of a n-gram pattern can
> be infered from all its (n+1)-gram super patterns' frequency.
> But in the pln corpus, we only mine 1 gram, and I guess the interesting
> patterns here you want to identify is the patterns of "the max degree of
> abstraction" , for example:
> pattern1: (x and y are friends) (x is musician) (y is musician) (z is
> musician) (z and y are friends)->(x and z are friends)
> pattern2: (x and y are friends) (x is var_job) (y is var_job) (z is var_job)
> (z and y are friends)->(x and z are friends)
> If pattern 1 occurs 10 times; pattern 2 also occurs 10 times, it means that
> pattern 2 only be right when var_job = musician, which means the abstraction
> to be pattern 2 is no sense. So patten 1 is already the max degree of
> abstraction in this case. If my unerstand is right, then I will need to
> write a new interestingness evalution for this, because it is different from
> the surpringness measure.

Hmm... well I am not sure if I am interpreting your example right...

Is the idea of the example that x, y and z are pattern-miner
variables, whereas var_job is an Atomspace VariableNode (not, in the
current implementation, what would be a a PatternVariableNode?)?

In that case ...let's consider a simpler example

pattern 1 = rich(x) and cute(y) and married(x,y)

pattern 2 = ThereExists z: rich(x) and z(y) and married(x,y)

pattern 3 = ForAll z: rich(x) and z(y) and married(x,y)

So in each of these cases, as I intend them: Everything except the x
and y is assumed to be there in the Atomspace. Only the x and y are
the PatternVariables...

If we have 100 occurrences of the pattern

ThereExists z: rich(x) and z(y) and married(x,y)

i.e. 100 cases such as

ThereExists z: rich(Bill) and z(Jane) and married(Bill, Jane)

ThereExists z: rich(Mary) and z(Kate) and married(Mary, Kate)

... etc.

-- then this is a valid pattern, right?

A more realistic example would be in calculus where you have many patterns like

ForAll epsilon: epsilon>0, ( ThereExists delta: ( delta>0 and
abs(x-y)<delta ==> abs( f(x) - f(y)) < epsilon))

ForAll epsilon: epsilon>0, ( ThereExists delta: ( delta>0 and
abs(x-y)<delta ==> abs( g(x) - g(y)) < epsilon))

...

where "epsilon" and "delta" and "x" and "y" are (in an OpenCog
representation) VariableNodes ...

So in this case, f and g are the constants being matched by
PatternVariables, so we could have a PatternVariable $PV and the
pattern miner could recognize the pattern

ForAll epsilon: epsilon>0, ( ThereExists delta: ( delta>0 and
abs(x-y)<delta ==> abs( $PV(x) - $PV(y)) < epsilon))

This is a nice pattern to find, as it's a pattern that exists in lots
of calculus proofs (it just says $PV is continuous...) ... it's no
problem that the pattern has lots of quantified variables and
quantifiers in it...

...

In the PLN case, if we take an example possible pattern like "two
deductions in a row, involving associated entities, are often useful"
that would look like

A ==> B, B==>C |- A==>C
A==>C, C ==> D |- A ==>D
HebbianLink (D,B)
useful(A==>D)

So the first two of these 4 lines are going to be embedded in a single
ExecutionOutputLink, I guess.... Then the other two will be their own
separate links in the Atomspace...

Suppose this pattern occurs 10 times in the Atomspace. Each of these
times, we will have different Atoms in the slots for A, B, C, D. Some
of these may be complex, e.g. we might have in one case

A equals

MemberLink
VariableNode $X
SatisfyingSet
EvaluationLink
PredicateNode "piece of poop"
ListLink
$X
ConceptNode "cheese doodle"

or whatever... In this case the fact that there's a VariableNode $X
in the interior of A doesn't matter.

Nil, it will take some work, but maybe it's worthwhile for you to
create a test Atomspace in which my above example pattern

A ==> B, B==>C |- A==>C
A==>C, C ==> D |- A ==>D
HebbianLink (D,B)
useful(A==>D)

is a surprising pattern, and in which some of the examples of A, B, C
or D have some complexity to them (some internal quantified
variables).

Having a more "real" example like this might help avoid any confusion
and aid Shujing in getting the pattern miner to work on PLN inference
histories in a useful way

Nil Geisweiller

unread,

Jun 19, 2017, 2:29:27 PM6/19/17

to Ben Goertzel, Shujing Ke, Nil Geisweiller, opencog

Ben,

On 06/19/2017 07:49 PM, Ben Goertzel wrote:
> In the PLN case, if we take an example possible pattern like "two
> deductions in a row, involving associated entities, are often useful"
> that would look like
>
> A ==> B, B==>C |- A==>C
> A==>C, C ==> D |- A ==>D
> HebbianLink (D,B)
> useful(A==>D)
>
> So the first two of these 4 lines are going to be embedded in a single
> ExecutionOutputLink, I guess.... Then the other two will be their own
> separate links in the Atomspace...

Indeed, so it would be a 4-gram pattern (if I understand correctly).

>
> Suppose this pattern occurs 10 times in the Atomspace. Each of these
> times, we will have different Atoms in the slots for A, B, C, D. Some
> of these may be complex, e.g. we might have in one case
>
> A equals
>
> MemberLink
> VariableNode $X
> SatisfyingSet
> EvaluationLink
> PredicateNode "piece of poop"
> ListLink
> $X
> ConceptNode "cheese doodle"
>
>
> or whatever... In this case the fact that there's a VariableNode $X
> in the interior of A doesn't matter.

Indeed. If the implication A==>B is an ImplicationScopeLink, the pattern
miner should abstract that into

QuoteLink
ImplicationScopeLink
UnquoteLink
VariableNode "$variable-declaration"
UnquoteLink
VariableNode "$A"
UnquoteLink
VariableNode "$B"

and it doesn't matter what variables appear inside A.

>
> Nil, it will take some work, but maybe it's worthwhile for you to
> create a test Atomspace in which my above example pattern
>
> A ==> B, B==>C |- A==>C
> A==>C, C ==> D |- A ==>D
> HebbianLink (D,B)
> useful(A==>D)

What do you mean exactly by "useful(A==>D)"?

If you mean that A==>D is a pattern abstracting previously successful
proved backward chainer targets, then maybe we want the pattern miner to
output conditional patterns, so that the resulting pattern wouldn't be a
SatisfyingSetScopeLink but rather say an ImplicationScopeLink

So that we'd ask the following patter miner query

ImplicationScopeLink
V
Y
useful(X)

where V, X and Y are meta-pattern-matcher variables as they represent
patterns that the pattern miner should come up with (of course all this
should be properly quoted), which looks very much like a Cognitive
Schematic. So in fact inference control would turn a bit into a
specialized OpenPsi process, but perhaps I digress...

>
> is a surprising pattern, and in which some of the examples of A, B, C
> or D have some complexity to them (some internal quantified
> variables).
>
> Having a more "real" example like this might help avoid any confusion
> and aid Shujing in getting the pattern miner to work on PLN inference
> histories in a useful way

Sure, I'll see what I can come up with. I'm also gonna keep studying the
pattern miner because some stuff are still a bit abstract to me.

Nil

>
> ben
>
>
>
>

Nil Geisweiller

unread,

Jun 19, 2017, 2:41:37 PM6/19/17

to Nil Geisweiller, Ben Goertzel, Shujing Ke, opencog

On 06/19/2017 09:29 PM, Nil Geisweiller wrote:
> ImplicationScopeLink
> V
> Y
> useful(X)
>
> where V, X and Y are meta-pattern-matcher variables as they represent
> patterns that the pattern miner should come up with (of course all this
> should be properly quoted), which looks very much like a Cognitive
> Schematic. So in fact inference control would turn a bit into a
> specialized OpenPsi process, but perhaps I digress...

In case it wasn't clear, I don't mean that the pattern miner would try
to find such frequent ImplicationScopeLink, there might be none, rather
it would look for any pattern, but output the important ones within that
format, if it makes sense, with the right TV assigned to them. Or maybe
this is beyond the pattern miner job, and it should be delegated to
another process.

Anyway, maybe after studying the pattern miner in depth all this will
become obvious.

Nil

Ben Goertzel

unread,

Jun 19, 2017, 10:48:42 PM6/19/17

to Nil Geisweiller, Shujing Ke, opencog

On Tue, Jun 20, 2017 at 2:29 AM, Nil Geisweiller
<ngei...@googlemail.com> wrote:
> What do you mean exactly by "useful(A==>D)"?

What I was thinking was: If the implication [666], e.g.

ImplicationLink [handle=666]
EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "Ben"
ConceptNode "cockroach"

InheritanceLink
ConceptNode "Ben"
ConceptNode "weird"

was used or created by the BC, and was found to be useful for whatever
inference the BC was doing when it used or created [666], then the
utility of this link should be annotated via

EvaluationLink
PredicateNode "useful"
ListLink
[666]
[111]

where [111] is the handle of the target of the BC inference the BC was
doing when it created [666].

So maybe my example should look more like

A ==> B, B==>C |- A==>C
A==>C, C ==> D |- A ==>D
HebbianLink (D,B)

useful(A==>D, T)

where T is a variable that matches the target of prior BC inferences...

Shujing Ke

unread,

Jun 20, 2017, 8:30:03 PM6/20/17

to Ben Goertzel, Nil Geisweiller, opencog

Hi, Ben and Nil,

Thanks for all your responses. I may be a bit slow this week - it is too warm here and my baby is sick, he barely eat and drink anything since yesterday morning.

1. About the output format and TV of patterns

The pattern miner will output the raw patterns found from the input data (without more process). Because different modules in Opencog and applications may require different output formats. It shouldn't be only one output format. Currently we can put our discussion based on raw pattern format. After we make sure the concents of patterns are right, we can discuss about the output formats for differnt modules. If I have time then, I can implement it, if I don't then I think each module's developer should also be easy to turn the raw patterns into the format they want. It is better to be on another layer out of the pattern miner, which is more convient for each module to modify the pattern format they need in future. Otherewise, any module wants to change some format, they have to modify the pattern miner core.

2. About the pattern gram

Actually the gram doesn't really exactly indicate the size of a pattern, it just mean the numbers of root links in a pattern.

A ==> B, B==>C |- A==>C
A==>C, C ==> D |- A ==>D
HebbianLink (D,B)

useful(A==>D)

Yes, it could be a 4 gram , but it can also be 1 gram, depends on the input data.

If you have a big Link likes:

ImplicationLink

AndLink

ImplicationLink A B
ImplicationLink B C
ImplicationLink C D
ImplicationLink A D

Then this pattern will be 1-gram.

Take the cockroach pattern for more example:

Suppose you have handle 666 and 777:

ImplicationLink [handle=666]
EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "Ben"
ConceptNode "cockroach"
InheritanceLink
ConceptNode "Ben"
ConceptNode "weird"

ImplicationLink [handle=777]
EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "NIl"
ConceptNode "cockroach"
InheritanceLink
ConceptNode "Nil"
ConceptNode "weird"

If only alow ImplicationLinks to be rootlinks, then the pattern 1 below is a 1-gram pattern:

Pattern 1:

ImplicationLink
EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "var1"
ConceptNode "cockroach"
InheritanceLink
ConceptNode "var1"
ConceptNode "weird"

If EvaluationLinks and InheritanceLinks are also allow to be rootlinks, then pattern 2,3,4 are all 2-gram patterns, because they contains two rootlinks. Of course, in this case, pattern 3 and 4 do not make much sense, but in the DBpedia data, these types of patterns are what we want. So we need to specify which link types should be rootlinks for different applications, to avoid a lot of useless patterns being mined. It can be set in config file or scm interface throuth the white and black link type list.

Pattern 2:

EvaluationLink
   PredicateNode "eat"
   ListLink
    ConceptNode "var1"
ConceptNode "cockroach"

InheritanceLink
   ConceptNode "var1"
   ConceptNode "weird"

Pattern 3:

EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "var1"
ConceptNode "cockroach"

ImplicationLink
EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "var1"
ConceptNode "cockroach"
InheritanceLink
ConceptNode "var1"
ConceptNode "weird"

Pattern 4:
ImplicationLink

EvaluationLink
PredicateNode "eat"
ListLink
ConceptNode "Ben"

ConceptNode "var1"
InheritanceLink
ConceptNode "Ben"
ConceptNode "var2"

InheritanceLink
ConceptNode "Nil"
ConceptNode "var2"

3. About unify link orders in unorderlinks in input data

It probably won't cost too much time to code, because it should be quite similar to the logic of pattern isomorphism identifying algorithm which I already have in pattern miner, becasue it is quite an important part of pattern miner. I should be able to reuse the logic.

4. About the interestingness evalution

I didn't quite get the meaning of the rich(x) and z(y) and married(x,y) example.
I think it is also related to the pattern gram. For below 2 patterns: x,y,z are variables
pattern A: rich(x) and z(y) and married(x,y)

pattern B: rich(x) and cute(y) and married(x,y)

If they are represented as 3 gram patterns, then it may be able to just evaluate their interesingness by surpringness
pattern A:

InheritanceLink x rich
InheritanceLink y z

EvaluationLink married x y

pattern B:

InheritanceLink x rich
InheritanceLink y cute

EvaluationLink married x y

If they are represented as 1 gram patterns, then I can implement an interestingness evalution based on the variables inside one root link.
pattern A:

ImplicationLink

AndLink

InheritanceLink x rich
InheritanceLink y z

EvaluationLink married x y

pattern B:

ImplicationLink

AndLink

InheritanceLink x rich
InheritanceLink y z

EvaluationLink married x y

5. A suggestion to make up a very simple tiny test data file

I suggest Nil to make up a simple test data file just to test if the output patterns are what you want and if the frequency count is correct. For example, I made up a simple data before - the ugly-man-drink-soda file, which contains 10 men, 10 women, among then 5 women and 5 men are ugly, and also 5 women and 5 men drink soda - it is expected to find the pattern that "ugly man drink soda". Because for such a tiny file, we can actually check every output pattern and its count to see if there is any bug. If it pass, then we can apply it on a big corpus. Otherwise, there are too many outputs for a big corpus, it is hard to examine the result.

Thanks,

Shujing

Shujing Ke

unread,

Jun 20, 2017, 8:31:39 PM6/20/17

to Ben Goertzel, Nil Geisweiller, opencog

correting a typeo:

in the point 4 in previous email:
pattern B should be:

ImplicationLink

AndLink

InheritanceLink x rich
InheritanceLink y cute

EvaluationLink married x y

Shujing Ke

unread,

Jun 20, 2017, 8:40:24 PM6/20/17

to Ben Goertzel, Nil Geisweiller, opencog

Actually in the point 2 in previous emaile, a more clear example is the pattern 5 given below for 2 gram patterns when only ImplicationLinks are allow to be root links:

Pattern 5:

ImplicationLink
EvaluationLink
PredicateNode "var1"
ListLink

ConceptNode "Ben"
ConceptNode "var2"
InheritanceLink

ConceptNode "Ben"
ConceptNode "var3"

ImplicationLink
EvaluationLink
PredicateNode "var1"
ListLink
ConceptNode "Nil"

ConceptNode "var2"
InheritanceLink
ConceptNode "Nil"

ConceptNode "var3"

It is of course not interesting, but these two ImplicationLins do connected via "eat" "werid" and "cockroach", so that is why I think in pln data, only 1-gram patterns are worthy to mine.

Nil Geisweiller

unread,

Jun 21, 2017, 1:50:55 AM6/21/17

to Shujing Ke, Ben Goertzel, Nil Geisweiller, opencog

Hi,

On 06/21/2017 03:29 AM, Shujing Ke wrote:
> Hi, Ben and Nil,
>
> Thanks for all your responses. I may be a bit slow this week - it is too
> warm here and my baby is sick, he barely eat and drink anything since
> yesterday morning.
>

> *1. About the output format and TV of patterns*

> The pattern miner will output the raw patterns found from the input data
> (without more process). Because different modules in Opencog and
> applications may require different output formats. It shouldn't be only
> one output format. Currently we can put our discussion based on raw
> pattern format. After we make sure the concents of patterns are right,
> we can discuss about the output formats for differnt modules. If I have
> time then, I can implement it, if I don't then I think each module's
> developer should also be easy to turn the raw patterns into the format
> they want. It is better to be on another layer out of the pattern miner,
> which is more convient for each module to modify the pattern format they
> need in future. Otherewise, any module wants to change some format, they
> have to modify the pattern miner core.

OK. Although I think we still need to come up soon with a way to pass
the results, including frequencies, interestingness, etc, as atoms in
the atomspace, as opposed to writing the results in a file.
> *5. A suggestion to make up a very simple tiny test data file *

> I suggest Nil to make up a simple test data file just to test if the
> output patterns are what you want and if the frequency count is correct.
> For example, I made up a simple data before - the ugly-man-drink-soda
> file, which contains 10 men, 10 women, among then 5 women and 5 men are
> ugly, and also 5 women and 5 men drink soda - it is expected to find the
> pattern that "ugly man drink soda". Because for such a tiny file, we can
> actually check every output pattern and its count to see if there is any
> bug. If it pass, then we can apply it on a big corpus. Otherwise, there
> are too many outputs for a big corpus, it is hard to examine the result.

Agreed. That'll make a good second unit test.

Nil

Ben Goertzel

unread,

Jun 21, 2017, 6:45:41 AM6/21/17

to Nil Geisweiller, Shujing Ke, opencog

On Wed, Jun 21, 2017 at 1:50 PM, Nil Geisweiller
<ngei...@googlemail.com> wrote:
> OK. Although I think we still need to come up soon with a way to pass the
> results, including frequencies, interestingness, etc, as atoms in the
> atomspace, as opposed to writing the results in a file.

Agreed...

Shujing Ke

unread,

Jun 24, 2017, 4:23:44 PM6/24/17

to Ben Goertzel, Nil Geisweiller, opencog

Yes, to output the pattern. There are two ways:

1. return the atomspace that store the patterns.

2. return a HandleSeq of patterns.

Each raw pattern will be quoted within a PatternLink.

Because the numbers of patterns are huge, I think it make sense to give an input parameter to specify the top percentage of patterns to be output. e.g.: only output top 10% frequent patterns; output top 7% frequent and 20% interesting patterns. Or directly specify the number of frequency : like frequency > 15

My baby is still sick, having serious diarrhea, does not want to have much food, need to go to hospital from time to time. So I probably won't get much work done this week. Hope he will be better next week.

Shujing

Ben Goertzel

unread,

Jun 25, 2017, 3:33:42 AM6/25/17

to opencog, Nil Geisweiller

Hi Shujing,

Well it seems that the mathematical basis of the interestingness
evaluation is the same whether we view this as a 3-gram pattern or as
a 1-gram pattern

So I guess the answer is that, yeah, the interestingness evaluation
has to be able to look at multiple variables within a "1-gram" and
also across different "grams" ...

There may be surprisingness

-- within a single "gram"

-- among multiple "grams"

-- involving, say, 2 variables in one gram, and then 2 variables in another gram

(where i'm using "gram" as a shorthand for "set of Atoms supervened
over by a single root link")

ben

Shujing Ke

unread,

Jun 25, 2017, 10:06:27 AM6/25/17

to opencog, Ben Goertzel, Nil Geisweiller

Yes, the math is basic the same. I will try to implement it basic on the same math.

It would be nice to have a simple small corpus for experimenting this interestingness implementation.

My son seems better today, starts to be playful again and more willing to eat and drink. But still have diarrhea.

I will implement exporting patterns first. And then the interestingness evaluation inside 1-gram pattern when Nil gives the small test corpus.

Thanks,

Shujing

ben

--
You received this message because you are subscribed to the Google Groups "opencog" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBe16V6TD6JC%3DQDVAd63fWpzbVe6sh3wtkqNz3bXpiYiuQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Nil Geisweiller

unread,

Jun 26, 2017, 1:30:41 AM6/26/17

to Shujing Ke, Ben Goertzel, Nil Geisweiller, opencog

On 06/24/2017 11:23 PM, Shujing Ke wrote:
> Yes, to output the pattern. There are two ways:
> 1. return the atomspace that store the patterns.
> 2. return a HandleSeq of patterns.

As far as the C++ API is concerned a HandleSeq, or an
OrderedHandleSet/UnorderedHandleSet depending on whether we want to
retain the order or not. Likewise for the scheme interface, a ListLink
or SetLink.

>
> Each raw pattern will be quoted within a PatternLink.

Oh you mean

PatternLink
QuoteLink
<pattern>

?

Why not. That means that the postprocessing to turn patterns into useful
atoms would occur outside of the pattern miner. Maybe we should do that
for now, then once the post-processing is well understood it can be
moved inside to the pattern miner.

Then we still need to represent the pattern scores, I've never
experimented with proto-atoms but I think that would be the way to go,
either that or using Evaluation or Execution, such as

Execution <1 1>
GroundedSchema "pattern-count"
List
PatternLink
...
Number "42"

telling there are 42 instances of that pattern. I suppose it is called
frequency in the context of the pattern miner (don't understand why it,
BTW).

>
> Because the numbers of patterns are huge, I think it make sense to give
> an input parameter to specify the top percentage of patterns to be
> output. e.g.: only output top 10% frequent patterns; output top 7%
> frequent and 20% interesting patterns. Or directly specify the number
> of frequency : like frequency > 15

I suppose for now the frequency (what I remember is called the support,
right?) would be enough. As it would control computational effort as well.

>
> My baby is still sick, having serious diarrhea, does not want to have
> much food, need to go to hospital from time to time. So I probably won't
> get much work done this week. Hope he will be better next week.

Sorry to hear to about that, I know this can be very distressful.

Nil

Nil Geisweiller

unread,

Jun 26, 2017, 1:43:27 AM6/26/17

to Shujing Ke, opencog, Ben Goertzel, Nil Geisweiller

Shujing,

On 06/25/2017 05:06 PM, Shujing Ke wrote:
> My son seems better today, starts to be playful again and more willing
> to eat and drink. But still have diarrhea.

Great!

>
> I will implement exporting patterns first. And then the interestingness
> evaluation inside 1-gram pattern when Nil gives the small test corpus.

BTW, there's gonna be a bit of time before I can hand you the corpus as
I'm attempting to generate it from an actual inference control learning
experiment, I'd say in a week or so.

Meanwhile if you could take a look at

https://github.com/opencog/opencog/issues/2787

specifically the first item which is a question to you. Also, having the
pattern miner unit test pass would be great. As I explain at the end of
the issue, I may prefer to do these changes myself as they will help me
to get familiar with the pattern miner code. Of course if you'd rather
take care of them, for educational purpose or whatnot, you are free to
do so as well.

Nil

> send an email to opencog+u...@googlegroups.com
> <mailto:opencog%2Bunsu...@googlegroups.com>.

> To post to this group, send email to ope...@googlegroups.com

> <mailto:ope...@googlegroups.com>.

> Visit this group at https://groups.google.com/group/opencog

> <https://groups.google.com/group/opencog>.

> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CACYTDBe16V6TD6JC%3DQDVAd63fWpzbVe6sh3wtkqNz3bXpiYiuQ%40mail.gmail.com

> <https://groups.google.com/d/msgid/opencog/CACYTDBe16V6TD6JC%3DQDVAd63fWpzbVe6sh3wtkqNz3bXpiYiuQ%40mail.gmail.com>.

> For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.
>
>

Shujing Ke

unread,

Jul 9, 2017, 4:34:33 PM7/9/17

to Nil Geisweiller, opencog, Ben Goertzel

Hi, thanks,

I just implemented quoting the output patterns and the return AtomSpace:

There is a "if_quote_output_pattern" in config file, if it is set to true, the patterns will be quoted.
The quote link type can be define with "output_pattern_quoted_linktype" in the config file, the default is MinedPatternLink.
The Frequency, InteractionInformation, Surpringness I and II value are stored in the keyvalue pair of "PatternValues" of the MinedPatternLink. Can be query by: MinedPatternLink->getValue(PatternValuesHandle).
Also after mining is finished, the AtomSpace contains all the patterns can be get by the interface: getResultAtomSpace(). By AtomSpace interface: get_handles_by_type (MINED_PATTERN_LINK ....).

For example:

(MinedPatternLink

(EvaluationLink

(PredicateNode "background")

(ListLink

(VariableNode "$var_1")

(ConceptNode "solo_singer")

)

(EvaluationLink

(PredicateNode "occupation")

(ListLink

(VariableNode "$var_1")

(ConceptNode "Singer-songwriter")

)

I had a quick look at the Pattern Miner clean up issue. I will try to fix some of them by the way when I modify the related content. Of course you are very welcome to fix things you think should be fixed. Just it would be nice if you can notify me before you start to fix them. Because I am still making a lot of changes, which always have a lot of conflicts with some small changes you and Linas fixed, and because they usually occurred on a lot of codes I had modified or just deleted, so I always cannot merge them... Recently I just spent hours to fix all the conflicts. If they are not very very important / core or emergent, maybe we can postpone all the cleanups until I finish the current pattern mining work stage?

I picked some instances from the corpus you gave me early to experiment with the new interestingness evaluation for them.

Thanks,

Shujing

send an email to opencog+unsubscribe@googlegroups.com
<mailto:opencog%2Bunsubscribe@googlegroups.com>.

To post to this group, send email to ope...@googlegroups.com

<mailto:opencog@googlegroups.com>.

Nil Geisweiller

unread,

Jul 9, 2017, 11:55:46 PM7/9/17

to Shujing Ke, opencog, Ben Goertzel

Hi Shujing,

On 07/09/2017 11:34 PM, Shujing Ke wrote:
> Hi, thanks,
>
> I just implemented quoting the output patterns and the return AtomSpace:
>

> 1. There is a "if_quote_output_pattern" in config file, if it is set to

> true, the patterns will be quoted.

> 2. The quote link type can be define with

> "output_pattern_quoted_linktype" in the config file, the default is
> MinedPatternLink.

> 3. The Frequency, InteractionInformation, Surpringness I and II value

> are stored in the keyvalue pair of "PatternValues" of the
> MinedPatternLink. Can be query by:
> MinedPatternLink->getValue(PatternValuesHandle).

> 4. Also after mining is finished, the AtomSpace contains all the

> patterns can be get by the interface: getResultAtomSpace(). By
> AtomSpace interface: get_handles_by_type (MINED_PATTERN_LINK ....).

That looks good, I think we can stick with this API for the time being.

Sure no problem, just notify me when you are done with work on the
pattern miner.

>
> I picked some instances from the corpus you gave me early to experiment
> with the new interestingness evaluation for them.

More will come soon (end of this week I think).

Thanks,
Nil

> send an email to opencog+u...@googlegroups.com
> <mailto:opencog%2Bunsu...@googlegroups.com>
> <mailto:opencog%2Bunsu...@googlegroups.com

> <mailto:opencog%252Buns...@googlegroups.com>>.

> To post to this group, send email to

> ope...@googlegroups.com <mailto:ope...@googlegroups.com>
> <mailto:ope...@googlegroups.com
> <mailto:ope...@googlegroups.com>>.

Reply all

Reply to author

Forward